-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLite queue is using all CPU on high frequency poller (<1s) #475
Comments
Related: |
Why are you running spiders that "do nothing at all"? |
@jpmckinney just to rule out that cpu is being used by a spider. This can be replicated when scheduling a lot of jobs and polling rate is below a second e.g 0.1. SQLite queue will use massive ammount of cpu. |
There are also some unmaintained repos that tries to solve this: Simply the sqlite queue is a really bad option for high frequency queues. |
Hmm, yeah, same with https://github.com/Tiago-Lira/scrapyd-mongodb (from which scrapyd-redis is forked) and https://github.com/balena/python-pqueue (mentioned in #197). https://github.com/peter-wangxu/persist-queue is still active, though maybe a first attempt is to switch to https://github.com/scrapy/queuelib as mentioned in #197. Can you share your setup for reproducing the issue? |
I will try to create a demo later, but it's pretty much can be empty scrapyd service running with 1 spider that does nothing. Then creating like 50 schedules per second and making polling rate 0.1. It will destroy powerful cpu. |
Also, in my personal opinion I would say it would make sense to add interface to add your own queue backend instead of doing hacks like those 2 repos mentioned above. |
And then later sqlite can be switch to some other default is needed, but having a simple method to replace the queue on your own would be a very good option to quick solve this problem for those who use high frequency polling |
Do you have your own queue ready to use? You can try it with this PR: #476 |
@jpmckinney thanks, give me a few hours I will try it out. |
@jpmckinney I am still doing some tests on my end, give me a few days I will report with more details. |
FWIW, I can't replicate this issue. I set |
…erqueue" (#475 is not reproducible)
When running spiders that do nothing at all, the sqlite based poller uses all cpu just reading scheduled tasks. It would be good to have a plug and play alternative queues like redis.
The text was updated successfully, but these errors were encountered: