-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"OverflowError: value too large to convert to int" when training word2vec on a large corpus #2578
Comments
CC @persiyanov – is this expected? What "value" is that? |
Looking at the Or was there some reason to using only |
The same problem occurs also with FastText: |
@persiyanov Ping on this. |
Another instance of 32 vs 64 bit integers causing trouble for users: https://groups.google.com/forum/#!topic/gensim/XbH5Sr6RBcI Not sure if it's the exact same instance as in this ticket… perhaps we need to promote counters from 32 bit to 64 bit more consistently, everywhere. |
@mpenkov @piskvorky My apologies for the late response. No, I don't think there was a deliberate choice of using Do you need a hand on this? I can devote some time next week to fix it. |
Definitely. If you could grep / review all the *2vec C code for such 32=>64 bit fixes, that would be awesome. We don't want any 32 bit counters. Thanks! |
@piskvorky |
@Xiaoxiong-Liu yes, a fix should have been a part of 4.0.0beta. Let me know if this still appears after you upgrade Gensim: |
Hi,
i am trying to train word2vec on on a large corpus (>500GB) and the newest gensim version (3.8.0) and an error occurs on every Thread:
Exception in thread Thread-10: Traceback (most recent call last): File "/home/michael/miniconda3/lib/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/home/michael/miniconda3/lib/python3.7/threading.py", line 865, in run self._target(*self._args, **self._kwargs) File "/home/michael/miniconda3/lib/python3.7/site-packages/gensim/models/base_any2vec.py", line 175, in _worker_loop_corpusfile total_examples=total_examples, total_words=total_words, **kwargs) File "/home/michael/miniconda3/lib/python3.7/site-packages/gensim/models/word2vec.py", line 794, in _do_train_epoch total_examples, total_words, work, neu1, self.compute_loss) File "gensim/models/word2vec_corpusfile.pyx", line 379, in gensim.models.word2vec_corpusfile.train_epoch_cbow OverflowError: value too large to convert to int
My command for training the Model is:
Word2Vec(corpus_file="encc_tokenized", size = 1024, window = 8, workers = 16)
Thank you for help
The text was updated successfully, but these errors were encountered: