OverflowError: value too large to convert to int32_t #1225

rulai-huajunzeng · 2017-07-26T03:09:30Z

Quite similar to issue #589 but I have to open a new one for the old one was closed. The steps to reproduce as below:

~/my_dir $ pip show spacy
Name: spacy
Version: 1.8.2
Summary: Industrial-strength Natural Language Processing (NLP) with Python and Cython
Home-page: https://spacy.io
Author: Matthew Honnibal
Author-email: matt@explosion.ai
License: MIT
Location: /usr/lib/python2.7/site-packages
Requires: numpy, murmurhash, cymem, preshed, thinc, plac, six, pathlib, ujson, dill, requests, regex, ftfy
~/my_dir $ python
Python 2.7.13 (default, Dec 22 2016, 09:22:15) 
[GCC 6.2.1 20160822] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import spacy
>>> nlp = spacy.en.English()
>>> nlp.vocab.strings.set_frozen(True)
>>> nlp(u'Whataasdfsdaf')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/spacy/language.py", line 320, in __call__
    doc = self.make_doc(text)
  File "/usr/lib/python2.7/site-packages/spacy/language.py", line 293, in <lambda>
    self.make_doc = lambda text: self.tokenizer(text)
  File "spacy/tokenizer.pyx", line 165, in spacy.tokenizer.Tokenizer.__call__ (spacy/tokenizer.cpp:5486)
  File "spacy/tokenizer.pyx", line 205, in spacy.tokenizer.Tokenizer._tokenize (spacy/tokenizer.cpp:6060)
  File "spacy/tokenizer.pyx", line 279, in spacy.tokenizer.Tokenizer._attach_tokens (spacy/tokenizer.cpp:7129)
  File "spacy/vocab.pyx", line 246, in spacy.vocab.Vocab.get (spacy/vocab.cpp:6986)
  File "spacy/vocab.pyx", line 269, in spacy.vocab.Vocab._new_lexeme (spacy/vocab.cpp:7249)
OverflowError: value too large to convert to int32_t

The text was updated successfully, but these errors were encountered:

honnibal · 2017-07-26T10:35:19Z

Thanks for the report! The set_frozen mechanism has been a stop-gap, and I'm not immediately sure what's changed here that's broken it. I'll likely fix the underlying problem for spaCy 2, rather than repairing this. The situation around the streaming data memory growth is much better in spaCy 2, because the integer IDs are now hash values, rather than strings.

honnibal · 2017-10-16T18:01:12Z

Please see #1424

In short: the streaming data memory growth is finally fixed properly in spaCy v2 🎉 . This means the flakey set_frozen functionality could be deleted from the StringStore, resolving this issue.

lock · 2018-05-08T15:27:15Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

honnibal added the bug Bugs and behaviour differing from documentation label Jul 26, 2017

honnibal closed this as completed Oct 16, 2017

lock bot locked as resolved and limited conversation to collaborators May 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OverflowError: value too large to convert to int32_t #1225

OverflowError: value too large to convert to int32_t #1225

rulai-huajunzeng commented Jul 26, 2017

honnibal commented Jul 26, 2017

honnibal commented Oct 16, 2017

lock bot commented May 8, 2018

OverflowError: value too large to convert to int32_t #1225

OverflowError: value too large to convert to int32_t #1225

Comments

rulai-huajunzeng commented Jul 26, 2017

honnibal commented Jul 26, 2017

honnibal commented Oct 16, 2017

lock bot commented May 8, 2018