Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vectors.pyx - dictionary size changed during iteration #9746

Closed
raqibhayder opened this issue Nov 25, 2021 · 7 comments · Fixed by #9868
Closed

vectors.pyx - dictionary size changed during iteration #9746

raqibhayder opened this issue Nov 25, 2021 · 7 comments · Fixed by #9868
Labels
bug Bugs and behaviour differing from documentation feat / vectors Feature: Word vectors and similarity

Comments

@raqibhayder
Copy link

raqibhayder commented Nov 25, 2021

When merging entities using doc.retokenize context manager, I am facing the same issue described in #8013 (reposting because #8013 was closed):

with doc.retokenize() as retokenizer:
    for span in new_entities:
        retokenizer.merge(doc[span.start:span.end])
return doc

Screen Shot 2021-11-25 at 4 25 53 PM

How to reproduce the behaviour

Fortunately, I was able to trace back to all the texts for which this error was thrown (this only happens in production), and tried to recreate the behaviour locally but was not able to. I am processing 1000s of texts and this error seems to only happen occasionally.

Note: This happens for both nlp(text) and nlp.pipe(list_of_texts)

Your Environment

  • spaCy version: 3.1.2
  • Platform: Google App Engine Instance
  • Python version: 3.8.0
  • Pipelines: en_core_web_md
@adrianeboyd
Copy link
Contributor

It's frustrating that this so hard to reproduce, but I guess there's no real harm in modifying this on our end. Can you confirm whether changing this to copy().items() fixes the problem, at least as far as you can easily check?

@adrianeboyd adrianeboyd added bug Bugs and behaviour differing from documentation feat / vectors Feature: Word vectors and similarity labels Nov 26, 2021
@raqibhayder
Copy link
Author

@adrianeboyd: I ran the following experiment:

  • With copy().items()

    1. Compiled spaCy (branch - v3.1.x) from source. Followed the instructions here and changed L210 in vectors.pyx to for key, row in list(self.key2row.copy().items()):
    2. Created ~10 million docs
    3. No RuntimeError: dictionary size changed during iteration error
  • With items()

    1. Used precompiled spaCy 3.1.2
    2. Created ~10 million docs
    3. No RuntimeError: dictionary size changed during iteration error

As there was no difference in results, the experiment was inconclusive IMO.

The only other option I see is to deploy the compiled version with copy().items() and deploy to production (where this is happening often). I would prefer not to go down that path if possible.

@raqibhayder
Copy link
Author

@adrianeboyd: I was wondering if there will be a patch for this in one of the future releases?

@adrianeboyd adrianeboyd linked a pull request Dec 15, 2021 that will close this issue
3 tasks
@adrianeboyd
Copy link
Contributor

adrianeboyd commented Dec 15, 2021

We can plan this for the next patch release, but I'm not sure of an exact date at this point.

@raqibhayder
Copy link
Author

Thank you 😄 !

@svlandeg
Copy link
Member

It'll be shipped in 3.2.2, but like Adriane says, we don't have a date yet, so keep an eye on the release page / twitter ;-)

@github-actions
Copy link
Contributor

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation feat / vectors Feature: Word vectors and similarity
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants