Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

..... #16

Closed
ArturMakly opened this issue Jul 24, 2018 · 2 comments
Closed

..... #16

ArturMakly opened this issue Jul 24, 2018 · 2 comments

Comments

@ArturMakly
Copy link

hi Brendon!

  1. thanks for making this!

  2. what is the best way to make a Crawl that is in-progress, PAUSE?
    ...and then (when the user decides ) to Continue from exactly where it left off?

cheers!!

@brendonboshell
Copy link
Owner

I would recommend using the RedisUrlList and running the crawler in a separate process that you can kill/resume as and when necessary. Using the start method of Crawler after you have called stop is problematic because it doesn't properly handle the case where you resume while there are outstanding requests from the previous crawl.

With the RedisUrlList/DbUrlList, Supercrawler is designed to work in a distributed way using Redis to store the crawl state (and locks when a new page crawl is initiated). Hence, you can simply kill/start processes as necessary and Supercrawler will cope with this.

@ArturMakly
Copy link
Author

Brendon once again.. you are a rock-star. thanks

@ArturMakly ArturMakly changed the title How to Pause & Restart a Crawl where it left off? ..... Sep 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants