-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dev] Optimize batch fetch method to boost throughput #269
base: master
Are you sure you want to change the base?
Conversation
The previous start url fetching method only working when spider is idle, which is not full concurrency.This patch optimizes it by using request_left_downloader signal. Signed-off-by: Tianyue Ren <rentianyue-jk@360shuke.com>
@NiuBlibing Please resolve the assertion error. And add unit test for |
@rmax How do you think about this implementation, it disabled |
Interesting the use of the other signal. What scrapy version is required for the new signal? What happens with existing users that override the spider_idle method? Does it make sense to bump the major version? Or somewhat related, shall we migrate to calendar versioning? |
daccc92
to
3245d28
Compare
Description
The previous start url fetching method only working when spider is idle, which is not full concurrency.This patch optimizes it by using request_left_downloader signal.
There maybe need a lock for calculating the
need_size
.Fixes #119
How Has This Been Tested?
Test Configuration:
Checklist: