..... #16

ArturMakly · 2018-07-24T21:36:00Z

hi Brendon!

thanks for making this!
what is the best way to make a Crawl that is in-progress, PAUSE?
...and then (when the user decides ) to Continue from exactly where it left off?

cheers!!

brendonboshell · 2018-07-24T21:48:36Z

I would recommend using the RedisUrlList and running the crawler in a separate process that you can kill/resume as and when necessary. Using the start method of Crawler after you have called stop is problematic because it doesn't properly handle the case where you resume while there are outstanding requests from the previous crawl.

With the RedisUrlList/DbUrlList, Supercrawler is designed to work in a distributed way using Redis to store the crawl state (and locks when a new page crawl is initiated). Hence, you can simply kill/start processes as necessary and Supercrawler will cope with this.

ArturMakly · 2018-07-25T00:26:56Z

Brendon once again.. you are a rock-star. thanks

brendonboshell mentioned this issue Jul 24, 2018

Ensure Crawler#start can be safely called after Crawler#stop #17

Open

brendonboshell closed this as completed Aug 17, 2018

ArturMakly changed the title ~~How to Pause & Restart a Crawl where it left off?~~ ..... Sep 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

..... #16

..... #16

ArturMakly commented Jul 24, 2018

brendonboshell commented Jul 24, 2018

ArturMakly commented Jul 25, 2018

..... #16

..... #16

Comments

ArturMakly commented Jul 24, 2018

brendonboshell commented Jul 24, 2018

ArturMakly commented Jul 25, 2018