Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite queue is using all CPU on high frequency poller (<1s) #475

Closed
pspsdev opened this issue Mar 8, 2023 · 13 comments
Closed

SQLite queue is using all CPU on high frequency poller (<1s) #475

pspsdev opened this issue Mar 8, 2023 · 13 comments

Comments

@pspsdev
Copy link

pspsdev commented Mar 8, 2023

When running spiders that do nothing at all, the sqlite based poller uses all cpu just reading scheduled tasks. It would be good to have a plug and play alternative queues like redis.

@pspsdev pspsdev changed the title SQLite queue is using all CPU SQLite queue is using all CPU on high frequency poller (<1s) Mar 8, 2023
@pspsdev
Copy link
Author

pspsdev commented Mar 8, 2023

Related:
#197

@jpmckinney
Copy link
Contributor

Why are you running spiders that "do nothing at all"?

@pspsdev
Copy link
Author

pspsdev commented Mar 8, 2023

@jpmckinney just to rule out that cpu is being used by a spider. This can be replicated when scheduling a lot of jobs and polling rate is below a second e.g 0.1. SQLite queue will use massive ammount of cpu.

@pspsdev
Copy link
Author

pspsdev commented Mar 8, 2023

There are also some unmaintained repos that tries to solve this:
https://github.com/speakol-ads/scrapyd-redis

Simply the sqlite queue is a really bad option for high frequency queues.

@jpmckinney
Copy link
Contributor

Hmm, yeah, same with https://github.com/Tiago-Lira/scrapyd-mongodb (from which scrapyd-redis is forked) and https://github.com/balena/python-pqueue (mentioned in #197).

https://github.com/peter-wangxu/persist-queue is still active, though maybe a first attempt is to switch to https://github.com/scrapy/queuelib as mentioned in #197.

Can you share your setup for reproducing the issue?

@pspsdev
Copy link
Author

pspsdev commented Mar 8, 2023

I will try to create a demo later, but it's pretty much can be empty scrapyd service running with 1 spider that does nothing. Then creating like 50 schedules per second and making polling rate 0.1. It will destroy powerful cpu.

@pspsdev
Copy link
Author

pspsdev commented Mar 8, 2023

Also, in my personal opinion I would say it would make sense to add interface to add your own queue backend instead of doing hacks like those 2 repos mentioned above.

@pspsdev
Copy link
Author

pspsdev commented Mar 8, 2023

And then later sqlite can be switch to some other default is needed, but having a simple method to replace the queue on your own would be a very good option to quick solve this problem for those who use high frequency polling

@jpmckinney
Copy link
Contributor

Do you have your own queue ready to use? You can try it with this PR: #476

@pspsdev
Copy link
Author

pspsdev commented Mar 8, 2023

@jpmckinney thanks, give me a few hours I will try it out.

@jpmckinney
Copy link
Contributor

@pspsdev Now that #476 is merged, do you have suggestions for how to edit the default spider queue, or should there just be a note in the documentation that it doesn't perform well under high frequency polling, and a custom queue would be better?

@pspsdev
Copy link
Author

pspsdev commented Mar 10, 2023

@jpmckinney I am still doing some tests on my end, give me a few days I will report with more details.

@jpmckinney
Copy link
Contributor

FWIW, I can't replicate this issue. I set poll_interval = 0.1 and scheduled 100 jobs in a loop.

jpmckinney added a commit that referenced this issue Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants