-
-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scrapes appear frozen time to time in Zimfarm #1756
Comments
works locally
and works with docker
both enter the article-downloading stage fine Is there anything special on the zimfarm? |
@uriesk not that i know... and we use to download list from that server.... but never of that size |
@uriesk https://kinsta.com/blog/increase-max-upload-size-wordpress/#increase-the-max-upload-file-size-in-nginx suggests NGINX error
Others recommend increasing these too:
Followed by something like:
[ Later Followup: ] |
@rgaudin Any idea? these erros are in the task-worker log https://farm.openzim.org/pipeline/32b22c3a583bd32c94c53d36/debug |
137 is OOM |
Ah sorry yes I've seen those. We need a ticket on ZF but it doesn't affect the scraping |
@rgaudin OK, have already increased memory available and restarted a scrape. |
@rgaudin Almost all the time (we had one which somehow achieved to go through this and then died later with |
I'll look into it but the task you referenced here failed after 1day and 50mn… not exactly upon startup |
Indeed, this time https://farm.openzim.org/recipes/wikipedia_en_top1m failed after about half a day according to https://farm.openzim.org/pipeline/81507cc3e015578b61a95d36 ("10 hours, 50 minutes" ?) ASIDE: This ~40 GB ZIM file will be a Lifeline for people who just cannot afford large microSD cards. In essentially all countries. So I'd like to help wherever I can! |
Better not run it on verbose. The three oldest available builds failed with errorcode 137 (out-of-memory).
so just one failed article, that we can exclude with I think the zimfarm just got some issues with massive verbose outputs of large scraps. |
wikipedia_en_top1m never worked
This doesn't match the ticket description at all. This looks like a memory hungry task that didn't fit within 15GB. I see that another one has been launched following openzim/zimfarm#738 fix. I doubt this would have much impact as this was just preventing the task worker from uploading the log. Even if that log was kept in memory, we're talking about 500MB… and the task worker is not resources limited. So for it to have an impact, it would require the worker to be completely maxed out on RAM and hope that the kernel decides to kill the scraper… I am ruling this out of zimfarm at the moment ; please let me know if your findings lead back to zimfarm. I would suggest you test locally by specifying memory resources limit on your docker command. |
@rgaudin The The zimfarm will not be the reason why the scraps fail. But it can be the reason why they appear as frozen (even if they might not be). All those Canceled ones had a frozen output. They should have either given us a OOM error, or a running output, or whatever error actually appeared (like the one a day ago). So i ask you to stick around, watch those builds, and if something appears to be frozen... check what is going on in the container. I can not locally test a full scrap of a 6 million article wikipedia like ceb or en or even just 1 million one. I can just check if it reaches one of the earlier stages without freezing. We can rename this issue to |
That makes more sense. I understand that indeed if you rely on timestamps in the live-updated stdout to tell a running task from a stuck one, then ZF issue would have made you think those tasks were stuck. That said:
I guess none of this matter now that we have eliminated the main culprit. Hopefully, the current run will enlighten us. As for monitoring, ping me here or on slack with a task when you want me to connect and find information for you ; I'd be happy to help. |
Because they appeared stuck right after starting redis, before any scrapping started Thanks, lets hope it works out 👍 |
FWIW yesterday's run contained:
Does Thank you to @kelson42 who launched another ZF attempt 3.5 hours ago: |
Both were from before the fix. Sorry about the timing conflict
No, it was killed by docker due to lack of RAM |
I'd like to mention that yesterday's run had The new one running is bound to |
I should have seen & realized 137 OOM yesterday, Right! (5GB extra RAM during each attempt can't hurt, if indeed it's that simple!) 🙏 |
|
Great Question @uriesk, given the importance of rapid/continuous improvement:
ASIDE, "Scrapper Log" should really be "Scraper Log" on every job's "Debug" tab (Debug page) like: |
Fixed the typo (openzim/zimfarm@f83b6ee). Please refresh. Note that I accidentally understood what you meant while I was following this ticket but I don't contribute to mwoffliner so zimfarm bugs have limited chances to get fixed from here 😉 |
So far so good — the latest scrape of wikipedia_en_top1m is still running after almost 34h: |
it's done, with an epic 200 GB file 🤔 And one |
Yes, I get the |
Very Awesome it appears so close; Thanks to everyone ! Can someone restart the job with ASIDE: Why does wikipedia_en_all_maxi use |
@holta yes, will do |
Heroic. Millions of people should provide their thanks to you and everyone helping here. When we're all done — reliably + compactly delivering the "world's knowledge" (i.e. a very meaningful snapshot subset of Wikipedia, in English for starters) every single month:
|
@uriesk thanks for your help completing this major accomplishment — that has the very real potential to help millions of kids — getting legit-not-stale encyclopedic knowledge every single month: 🎆
(i.e. if the wikipedia_en_top1m scraping recipe hopefully proves much more reliable/ruggedized than https://farm.openzim.org/recipes/wikipedia_en_all_maxi ❓❗) |
This recipe has its very own freeze pattern at the very start, even before scraping looks like!
See https://farm.openzim.org/pipeline/32b22c3a583bd32c94c53d36/debug
The text was updated successfully, but these errors were encountered: