-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Feeding a Spider from Redis
Jeremy Chou edited this page May 30, 2023
·
2 revisions
The class scrapy_redis.spiders.RedisSpider
enables a spider to read the urls
from redis
. The urls
in the redis
queue will be processed one after another, if the first request yields more requests, the spider will process those requests before fetching another url
from redis
.
For example, create a file myspider.py with the code below:
from scrapy_redis.spiders import RedisSpider
class MySpider(RedisSpider):
name = 'myspider'
def parse(self, response):
# do stuff
pass
Then:
- run the spider:
scrapy runspider myspider.py
- push urls to redis:
redis-cli lpush myspider:start_urls http://google.com
These spiders rely on the spider idle signal to fetch start urls, hence it may have a few seconds of delay between the time you push a new url and the spider starts crawling it.