You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently downloading a site using Heritrix, and I don't exactly want to leave my computer on overnight. Can I simply just stop a crawl, and resume it later?
Taking a look at https://heritrix.readthedocs.io/en/latest/operating.html#full-recovery, I determined that if I were to 'accidentally' crash the java program, I can put the /jobs/x/date/logs/frontier.recover.gz file in /jobs/x/action, create the server and launch the job again to resume it. Is this correct, or were crawls meant to be a do-it-all-right-now thing? I've tried this and it didn't really work. I used kill PID on the server in my Terminal, relaunched it to see that it started scraping under a new directory and that it moved my frontier.recover.gz file to /jobs/x/action/done, so I'm not sure if it worked.
The text was updated successfully, but these errors were encountered:
I am currently downloading a site using Heritrix, and I don't exactly want to leave my computer on overnight. Can I simply just stop a crawl, and resume it later?
Taking a look at https://heritrix.readthedocs.io/en/latest/operating.html#full-recovery, I determined that if I were to 'accidentally' crash the java program, I can put the
/jobs/x/date/logs/frontier.recover.gz
file in/jobs/x/action
, create the server and launch the job again to resume it. Is this correct, or were crawls meant to be a do-it-all-right-now thing? I've tried this and it didn't really work. I usedkill PID
on the server in my Terminal, relaunched it to see that it started scraping under a new directory and that it moved myfrontier.recover.gz
file to/jobs/x/action/done
, so I'm not sure if it worked.The text was updated successfully, but these errors were encountered: