scrapy-db

Similar to scrapy-redis, using the database as a queue, database-based scrapy components.

中文文档

Features

Distributed crawling/scraping

You can start multiple spider instances that share a single db queue. Best suitable for broad multi-domain crawls.
Distributed post-processing

Scraped items gets pushed into a DB queued meaning that you can start as many as needed post-processing processes sharing the items queue.
Scrapy plug-and-play components

Scheduler + Duplication Filter, Base Spiders.

Requirements

Python 3.7+
peewee >= 3.16.0
Scrapy >= 2.7.0
pymysql >= 1.0.3

Installation

From pip

pip install scrapy-db

From GitHub

git clone https://github.com/libra146/scrapy-db.git
cd scrapy-db
python setup.py install

From poetry

poetry add scrapy-db

If you are conducting distributed crawling tasks, scraper db is a very practical scraper component that can help you complete tasks more efficiently.

Use

Clone the current project and run the example crawler in example-project to experience it.

❗️Notice

This repository is still under development and may be unstable.

Why is there this library

Because I have a huge request pool, I don't have that much memory for redis to save it, so, I thought about database, I created it with reference to scrapy-redis and it works fine.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
example-project		example-project
scrapy_db		scrapy_db
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrapy-db

中文文档

Features

Requirements

Installation

Use

❗️Notice

Why is there this library

About

Releases 5

Packages

Languages

License

libra146/scrapy-db

Folders and files

Latest commit

History

Repository files navigation

scrapy-db

中文文档

Features

Requirements

Installation

Use

❗️Notice

Why is there this library

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages