Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a scrapy backed cache based on WARC #6

Open
turicas opened this issue Oct 26, 2019 · 0 comments
Open

Create a scrapy backed cache based on WARC #6

turicas opened this issue Oct 26, 2019 · 0 comments
Assignees

Comments

@turicas
Copy link
Owner

turicas commented Oct 26, 2019

If we extract the WARC write code from the spider and then create a WARC read routine we can implement a scrapy cache storage which reads and writes WARC files. This could be very handy to create archives from running old spiders just changing the cache setting.

@turicas turicas self-assigned this Oct 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant