Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spacy.en.English() takes a very long time to load on a newly started EC2 instace #890

Closed
jspalink opened this issue Mar 16, 2017 · 2 comments
Milestone

Comments

@jspalink
Copy link

I have a web service that provides data, including some NLP analysis, for some internal tools at work. The first time a new EC2 instance is launched, the spacy.en.English() command takes a very long time to load - on the order of about 15 minutes. This only happens once. If I restart the EC2 instance, it loads as expected (10-20 seconds). This only happens the first time the EC2 instance is loaded after launching from an AMI (which already has spacy installed, downloaded, etc). I can't really think of why this would be unless it was trying to redownload something automatically. The CPU usage during this call is usually hovering at around 1%, while during normal operation, it blazes through at around 98%. Any ideas or suggestions on where to look? I'm trying to create auto-scaling groups, but thats really hard to do if it takes so long to start up a brand new instance.

Your Environment

  • Operating System: Linux version 4.4.51-40.58.amzn1.x86_64 (mockbuild@gobi-build-64011) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) ) failed on build #1 SMP Tue Feb 28 21:57:17 UTC 2017
  • Python Version Used: Python 2.7.12 (default, Sep 1 2016, 22:14:00)
  • spaCy Version Used: 1.6.0
@ines ines added this to the v1.7.0 milestone Mar 18, 2017
@ines
Copy link
Member

ines commented Mar 18, 2017

We just pushed a new release that also comes with a smaller model that should take significantly less time to load (~50MB, about 2% less accurate than the larger model).

The small model is now the default model, so in 1.7.0, you can simply do:

python -m spacy download en # download small model (50MB)
python -m spacy download en_core_web_md # download medium model (1GB)

@ines ines closed this as completed Mar 18, 2017
@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants