-
Notifications
You must be signed in to change notification settings - Fork 730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: stopping the crawlers gracefully with BasicCrawler.stop()
#2792
base: master
Are you sure you want to change the base?
Conversation
yes
makes sense |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Btw the purging in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a question.
In Python PR
stop was implemented by modifying __is_finished_function
, so it avoids calling abort explicitly. Here it explicitly calls abort. I guess we should decide on one way and align both?
crawlee/packages/basic-crawler/src/internals/basic-crawler.ts Lines 983 to 987 in d5e469a
See the comments - TLDR: I think we both arrived at the same result, this version uses the |
Co-authored-by: Martin Adámek <banan23@gmail.com>
Co-authored-by: Martin Adámek <banan23@gmail.com>
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as resolved.
This comment was marked as resolved.
ba466a7
to
e7388de
Compare
Allows users to call
crawler.stop()
to gracefully stop the crawler.Currently, the crawlers are "stateless", i.e. calling:
Will only crawl
example.com
once, then stop and purge the RQ / dataset, so the secondcrawler.run()
call will yield no results.I suppose this is expected, but we could easily add a
crawler.pause()
method, which would keep the inner state for resuming withcrawler.run()
.Closes #2777