Feeding a Spider from Redis

The class scrapy_redis.spiders.RedisSpider enables a spider to read the urls from redis. The urls in the redis queue will be processed one after another, if the first request yields more requests, the spider will process those requests before fetching another url from redis.

For example, create a file myspider.py with the code below:

from scrapy_redis.spiders import RedisSpider

class MySpider(RedisSpider):
    name = 'myspider'

    def parse(self, response):
        # do stuff
        pass

Then:

run the spider:

scrapy runspider myspider.py

push urls to redis:

redis-cli lpush myspider:start_urls http://google.com

Note

These spiders rely on the spider idle signal to fetch start urls, hence it may have a few seconds of delay between the time you push a new url and the spider starts crawling it.

Overview

Introduction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feeding a Spider from Redis

Note

Overview

Introduction

Installation

Usage

Example Usage

Basic Concept

Feeding a Spider from Redis

Running a example project

Contribution

Types of Contributions

Getting Started

History

History

Examples

Persist data on database or local file

Clone this wiki locally