Doc2Vec.infer_vector: AttributeError: 'Doc2Vec' object has no attribute 'syn1' #483

codingluke · 2015-10-16T09:39:17Z

Hi all,
I trained a Doc2Vec model successfully with the data of the Kaggle Tutorial "Bag of Words Meets Bags of Popcorn" https://www.kaggle.com/c/word2vec-nlp-tutorial/data. The methods most_similar and doesnt_match are working like expected.

However, when I use the infer_vector method, the error AttributeError: 'Doc2Vec' object has no attribute 'syn1' arises. When I check the model, there is just an model.syn0 available.

Systeminfo

MacOSX 10.10.5,
Python 2.7.10

Packages (I don't use Cyclone at the moment...)

boto (2.38.0)
bz2file (0.98)
gensim (0.12.2)
httpretty (0.8.6)
numpy (1.10.1)
pip (7.1.2)
requests (2.8.1)
scipy (0.16.0)
setuptools (18.2)
six (1.10.0)
smart-open (1.3.0)
wheel (0.24.0

Example in IPython

In [5]: from gensim.models import Doc2Vec
In [6]: model = Doc2Vec.load('./Doc2Vec300features_40minwords_10context')
In [7]: model.infer_vector("hallo ich bin ein text".split())
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-c4a827fd56d1> in <module>()
----> 1 model.infer_vector("hallo ich bin ein text".split())

/Users/{myuser}/Documents/Dev/virt_env2/lib/python2.7/site-packages/gensim/models/doc2vec.pyc in infer_vector(self, doc_words, alpha, min_alpha, steps)
    694                 train_document_dm(self, doc_words, doctag_indexes, alpha, work, neu1,
    695                                   learn_words=False, learn_hidden=False,
--> 696                                   doctag_vectors=doctag_vectors, doctag_locks=doctag_locks)
    697             alpha = ((alpha - min_alpha) / (steps - i)) + min_alpha
    698

/Users/{myuser}/Documents/Dev/virt_env2/lib/python2.7/site-packages/gensim/models/doc2vec_inner.pyx in gensim.models.doc2vec_inner.train_document_dm (./gensim/models/doc2vec_inner.c:4736)()
    419
    420     if hs:
--> 421         syn1 = <REAL_t *>(np.PyArray_DATA(model.syn1))
    422
    423     if negative:

AttributeError: 'Doc2Vec' object has no attribute 'syn1'

THX for your Help! :)

The text was updated successfully, but these errors were encountered:

codingluke · 2015-10-16T10:28:28Z

I could handle the error i two ways:

setting the parameter hs=0 by initializing the model or
not calling model.init_sims()

I'm not a deep expert in this topic, but I think the hs (hierachical sampling for training) seems to me important isn't it? Also the init_sims which is "freezing" the model, so that its faster an smaller, is a good thing.

codingluke · 2015-10-16T13:02:00Z

I just figured out. it also works with init_sims(replace=False).

In [9]: from gensim.models import Doc2Vec
In [10]: model = Doc2Vec.load('./Doc2VecMini300features_40minwords_10context')
In [11]: type(model.syn1)
Out[11]: numpy.ndarray
In [12]: model.init_sims(replace=True)
In [13]: type(model.syn1)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-5f166d4ec376> in <module>()
----> 1 type(model.syn1)

AttributeError: 'Doc2Vec' object has no attribute 'syn1'

init_sims(replace=True) seems do delete the attribute syn1 from the model which is used by infer_vector.

codingluke · 2015-10-16T13:25:58Z

I think it's clear now. model.infer_vectors trains the new documents with the neural weights of the actual model (https://github.com/piskvorky/gensim/blob/develop/gensim/models/doc2vec.py#L684).
As model.init_sims(replace=True) is deleting them for memory save reasons, the method model.infer_vectors can not work. It's the same reason why model.train is not working after model.init_sims(replace=True).

When I'm right, it might be good to give an appropriate error/warning, or/and add a comment in the docs.

gojomo · 2015-10-16T16:58:22Z

Thanks for your report. Yes, inference works almost exactly like training, so a model with training-state discarded won't be able to reasonably infer either. The comment for init_sims(replace=True) could be a bit clearer.

This area might benefit from a bit more renaming/commenting/refactoring for full clarity, for a few reasons related to what...

The fact that init_sims(replace=True) will clear syn1 (which is only used when hs=1) but not syn1neg (which serves the same role when negative>0) is a bit inconsistent.
There is one variant of Doc2Vec training – pure DBOW – that doesn't initialize or need the words syn0 at all, but would still need the syn1 or syn1neg. So if for some reason you did bother to call init_sims(replace=True) on it, its ability to infer would survive if it were based on negative sampling (since syn1neg isn't discarded)... but would break if using hierarchical-softmax (since syn1 is discarded). So it's unclear if init_sims(replace=True) should imply 'minimize my model in all ways', or if that should become a different explicit step (as suggested in Word2Vec/Doc2Vec offer model-minimization method #446).
in a few projects/papers it has been mentioned that the word vectors created out of a concatenation of the syn0 ('context') and syn1/syn1neg ('prediction') can outperform the plain syn0 context vectors. Supporting experiments with that would further change the situations when the syn1/syn1neg should be consulted/discarded.

tmylk · 2016-01-09T21:38:19Z

@gojomo Should this be marked as easy?

gojomo · 2016-01-12T15:09:54Z

@tmylk these are ease-of-use/least-surprise/ease-of-understanding that overlap a bit with the expressed interest of #446... none of the edits are hard but deciding what makes the most sense to the average user would take some familiarity with the code/uses.

Cumberbatch08 · 2017-10-09T13:12:55Z

m = g.Doc2Vec.load(saved_path) #load model
test_docs = [ x.strip().split() for x in codecs.open(test_docs, "r","utf-8").readlines() ]
output = open(output_file, "w")
for d in test_docs:
    output.write(" ".join([str(x) for x in m.infer_vector(d, alpha=start_alpha, steps=infer_epoch)]) + "\n")

when I run these code, I get the error：AttributeError: 'Doc2Vec' object has no attribute 'neg_labels'
so , what should I do to set the parameters,? I am a beginner, thank you!

gojomo · 2017-10-09T18:11:00Z

@StevenChen1993 - Are you receiving a "slow version" warning in logs when you use Doc2Vec? neg_labels is a part of the model only needed/created when the optimized code is unavailable. So you could see this message if the model were created in an environment where gensim was fully installed (training had access to the optimized code), but then re-loaded to an environment where installation of the optimized variants failed. The best fix would be to make sure your deployment installation has the optimized paths (isn't getting the "slow version" message), perhaps by uninstalling and reinstalling gensim and watching for any errors. Otherwise, training/inference could be 100x slower for that environment. (Alternatively, you could patch a neg_labels into your loaded model like down here in the slow path and use the slow inference.)

Cumberbatch08 · 2017-10-10T02:49:33Z

thanks for your answer!

felixsmueller · 2019-09-17T18:11:54Z

Hi

I had the same exception. The problem was that the model was trained using the fast version but when I installed gensim (3.8.0) on Windows I did not get a warning that the slow version was used.

I followed the instruction on https://radimrehurek.com/gensim/install.html which then successfully installed the fast version of Gensim (3.8.0) on Windows:
conda install -c conda-forge gensim

PS:
The following did NOT install the fast version on Windows and neither did it print a warning that the slow versino was used:
conda install gensim

piskvorky · 2019-09-17T18:50:15Z

Thanks @felixsmueller ; cross-linking to #2600 .

@mpenkov do we instruct people to use the conda-forge instead? I always forget what does what, I'm not familiar / a fan of that ecosystem.

mpenkov · 2019-09-28T21:09:48Z

@piskvorky I'm totally unfamiliar with conda myself. Do any of the gensim developers actually use it? If yes, it'd be good for that person to handle it. If no, then I suppose "one of us" could dedicate some time towards learning more about it, and then come back to solving this problem, although I must admit it isn't a particularly tempting endeavor.

I'm also struggling to understand whether we're dealing with a problem in gensim proper, or if it's a problem with the feedstock (https://github.com/conda-forge/gensim-feedstock/).

piskvorky · 2019-09-28T23:21:55Z

I believe @gojomo has used it.

I guess having binary wheels for Windows fixes most of such issues – we can now just tell people to do pip install. And forget about debugging and updating the proprietary conda ecosystem.

mpenkov · 2019-09-29T04:03:52Z

+1

@gojomo @menshikh-iv Any thoughts?

menshikh-iv · 2019-09-29T08:01:14Z

@mpenkov up to you, conda widely used by the data-science community, for this reason, I'm -1 to drop that.

mpenkov · 2019-09-29T09:39:51Z

I wonder if we can find a conda zealot who is willing to maintain the feedstock officially. Essentially, a new maintainer for https://github.com/conda-forge/gensim-feedstock/ and a go-to person for conda issues...

piskvorky · 2019-09-29T09:52:35Z

I think there are 3 ways to install Gensim in the conda ecosystem:

Using Anaconda (some sort of pre-packaged platform, they charge money for some versions)
- packages inside, incl. Gensim, are updated by the Continuum Analytics team. I don't think we can do upgrades ourselves.
Using an "external" conda-forge channel, which is open source.
- We could upgrade ourselves, if there's someone to do the maintenance and support. I have zero interest myself.
Using normal pip.
- Easiest option, no extra work for us.

Though I may have messed that up completely! Someone correct me.

gojomo · 2019-09-29T23:48:50Z

I usually like to use (mini)conda to manage my dev environment. (The 'mini' version because I don't want the installation-overhead/complexity/etc of the full 'anaconda' package set.) I tend to install jupyter, numpy, scipy via the native conda installation – to be sure to get their well-maintained/well-optimized versions of those central packages from their repo – but then just pip install things like gensim. That's worked well enough for me, on MacOS & Linux OSes, and handles the same whether using Python 2 or Python 3 (without needing different virtual-environment helpers).

So from my perspective: We don't have to do any extra conda-work, or worry about other 'conda-forge' repos, or whatever – just encourage people to use pip install, no matter their environment.

(But also: this all seems a digression from what I see as the real reason for this bug-report: tiny behavioral differences between the 'optimized' and 'pure-python' paths, plus other recurring issues where the optimized code isn't available. Dropping the pure-python paths entirely will simplify maintenance immensely, though the code would then be less useful as a teaching tool.)

gojomo · 2022-03-01T23:03:06Z

As we no longer have the possibly-divergent pure-Python paths, I don't think this should recur. If that assumption is wrong, feel free to re-open w/ details.

gojomo self-assigned this Oct 16, 2015

gojomo added bug Issue described a bug documentation Current issue related to documentation labels Oct 16, 2015

xchangcheng mentioned this issue Jul 12, 2016

AttributeError: 'Doc2Vec' object has no attribute 'syn0' when call infer_vector #785

Closed

menshikh-iv added the difficulty easy Easy issue: required small fix label Oct 3, 2017

mpenkov added the conda label Sep 29, 2019

gojomo closed this as completed Mar 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc2Vec.infer_vector: AttributeError: 'Doc2Vec' object has no attribute 'syn1' #483

Doc2Vec.infer_vector: AttributeError: 'Doc2Vec' object has no attribute 'syn1' #483

codingluke commented Oct 16, 2015

codingluke commented Oct 16, 2015

codingluke commented Oct 16, 2015

codingluke commented Oct 16, 2015

gojomo commented Oct 16, 2015

tmylk commented Jan 9, 2016

gojomo commented Jan 12, 2016

Cumberbatch08 commented Oct 9, 2017 •

edited by mpenkov

Loading

gojomo commented Oct 9, 2017 •

edited

Loading

Cumberbatch08 commented Oct 10, 2017

felixsmueller commented Sep 17, 2019

piskvorky commented Sep 17, 2019 •

edited

Loading

mpenkov commented Sep 28, 2019

piskvorky commented Sep 28, 2019 •

edited

Loading

mpenkov commented Sep 29, 2019

menshikh-iv commented Sep 29, 2019

mpenkov commented Sep 29, 2019

piskvorky commented Sep 29, 2019 •

edited

Loading

gojomo commented Sep 29, 2019

gojomo commented Mar 1, 2022

Doc2Vec.infer_vector: AttributeError: 'Doc2Vec' object has no attribute 'syn1' #483

Doc2Vec.infer_vector: AttributeError: 'Doc2Vec' object has no attribute 'syn1' #483

Comments

codingluke commented Oct 16, 2015

codingluke commented Oct 16, 2015

codingluke commented Oct 16, 2015

codingluke commented Oct 16, 2015

gojomo commented Oct 16, 2015

tmylk commented Jan 9, 2016

gojomo commented Jan 12, 2016

Cumberbatch08 commented Oct 9, 2017 • edited by mpenkov Loading

gojomo commented Oct 9, 2017 • edited Loading

Cumberbatch08 commented Oct 10, 2017

felixsmueller commented Sep 17, 2019

piskvorky commented Sep 17, 2019 • edited Loading

mpenkov commented Sep 28, 2019

piskvorky commented Sep 28, 2019 • edited Loading

mpenkov commented Sep 29, 2019

menshikh-iv commented Sep 29, 2019

mpenkov commented Sep 29, 2019

piskvorky commented Sep 29, 2019 • edited Loading

gojomo commented Sep 29, 2019

gojomo commented Mar 1, 2022

Cumberbatch08 commented Oct 9, 2017 •

edited by mpenkov

Loading

gojomo commented Oct 9, 2017 •

edited

Loading

piskvorky commented Sep 17, 2019 •

edited

Loading

piskvorky commented Sep 28, 2019 •

edited

Loading

piskvorky commented Sep 29, 2019 •

edited

Loading