Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WPEN is not scrapeable anymore in 16GB of RAM #948

Closed
kelson42 opened this issue Aug 22, 2019 · 13 comments
Closed

WPEN is not scrapeable anymore in 16GB of RAM #948

kelson42 opened this issue Aug 22, 2019 · 13 comments
Assignees
Milestone

Comments

@kelson42
Copy link
Collaborator

See https://farm.openzim.org/tasks/5d5c519f1339d1137618f4b1
An additional effort should be done on memory consumption

@kelson42 kelson42 added the bug label Aug 22, 2019
@kelson42 kelson42 added this to the 1.9-maintenance milestone Aug 22, 2019
@kelson42
Copy link
Collaborator Author

kelson42 commented Aug 22, 2019

@ISNIT0 @mgautierfr Any idea how we could do better?

@kelson42
Copy link
Collaborator Author

Here is the monitoring at that time
image

@kelson42
Copy link
Collaborator Author

@ISNIT0 I'm not so sure anymore mwoffliner has crashed because of lack of memory considering that we had still a lot of memory cached.

@kelson42 kelson42 self-assigned this Aug 24, 2019
@kelson42
Copy link
Collaborator Author

@ISNIT0 I have asked @rgaudin to restart a scrape, the error is quite weird to me. I want to see if the problem is reproducable.

@ISNIT0
Copy link
Contributor

ISNIT0 commented Aug 27, 2019

@kelson42 Any news?

Looks like it fails when trying to resize the favicon, not sure what your monitoring says:

Failed to run mwoffliner after [116020s]: {
	"stack": "Error: spawn ENOMEM\n    at ChildProcess.spawn (internal/child_process.js:366:11)\n    at Object.spawn (child_process.js:551:9)\n    at exec2 (/tmp/mwoffliner/node_modules/imagemagick/imagemagick.js:24:25)\n    at Object.exports.convert (/tmp/mwoffliner/node_modules/imagemagick/imagemagick.js:253:10)\n    at Promise (/tmp/mwoffliner/lib/mwoffliner.lib.js:423:47)\n    at new Promise (<anonymous>)\n    at resizeFavicon (/tmp/mwoffliner/lib/mwoffliner.lib.js:422:28)\n    at /tmp/mwoffliner/lib/mwoffliner.lib.js:439:28\n    at Generator.next (<anonymous>)\n    at /tmp/mwoffliner/lib/mwoffliner.lib.js:9:71\n    at new Promise (<anonymous>)\n    at __awaiter (/tmp/mwoffliner/lib/mwoffliner.lib.js:5:12)\n    at saveFavicon (/tmp/mwoffliner/lib/mwoffliner.lib.js:419:20)\n    at /tmp/mwoffliner/lib/mwoffliner.lib.js:337:23\n    at Generator.next (<anonymous>)\n    at fulfilled (/tmp/mwoffliner/lib/mwoffliner.lib.js:6:58)\n    at process._tickCallback (internal/process/next_tick.js:68:7)",
	"message": "spawn ENOMEM",
	"errno": "ENOMEM",
	"code": "ENOMEM",
	"syscall": "spawn"
}


**********

spawn ENOMEM

**********

MWO has scraped all the content, but post-processing the favicon seems to crash...

@rgaudin
Copy link
Member

rgaudin commented Aug 27, 2019

Failed again in the favicon resize. https://farm.openzim.org/tasks/5d611a341339d1137618f565

@kelson42 kelson42 removed their assignment Aug 27, 2019
@ISNIT0
Copy link
Contributor

ISNIT0 commented Aug 27, 2019

Looks like it's related to this: nodejs/node#25382

@kelson42
Copy link
Collaborator Author

@ISNIT0 You told me that the problems comes probably from the fact that a process fork happens here at at time the Node.js process is already quite big. Maybe the solution is simply to move this image resize to the start of the scraping process?

@ISNIT0
Copy link
Contributor

ISNIT0 commented Sep 2, 2019

It seems imagemin (our image compression library) also suffers from this problem, so we could potentially save even more memory by finding a node configuration variable. nodejs/node#25382 (comment)

The favicon is done at the end because favicon downloading etc. is handled the same way any other media file is downloaded.

@ISNIT0
Copy link
Contributor

ISNIT0 commented Sep 2, 2019

This PR may help #968

@kelson42
Copy link
Collaborator Author

kelson42 commented Sep 9, 2019

@ISNIT0 Still same problem

T:91842; 16000/19032
T:91844; 17000/19032
T:91846; 18000/19032
T:91848; 19000/19032
T:91900;  write checksum
T:91900; finish
[log] [2019-09-09T15:10:53.780Z] Summary of scrape actions: {
	"files": {
		"success": 30889,
		"fail": 0
	},
	"articles": {
		"success": 5925375,
		"fail": 11
	},
	"redirects": {
		"written": 8408528
	}
}
[log] [2019-09-09T15:10:53.787Z] Finished dump
[log] [2019-09-09T15:10:53.794Z] Doing dump
[log] [2019-09-09T15:10:53.798Z] Writing zim to [/output/wikipedia_en_all_nopic_2019-09.zim]
[log] [2019-09-09T15:10:53.799Z] Flushing Redis file store
[log] [2019-09-09T15:10:54.098Z] Found [5] stylesheets to download
[log] [2019-09-09T15:10:54.098Z] Downloading stylesheets and populating media queue
[log] [2019-09-09T15:10:54.166Z] Downloaded stylesheets
[log] [2019-09-09T15:10:54.167Z] Saving favicon.png...
Failed to run mwoffliner after [102343s]: {
	"stack": "Error: spawn ENOMEM\n    at ChildProcess.spawn (internal/child_process.js:366:11)\n    at Object.spawn (child_process.js:551:9)\n    at exec2 (/tmp/mwoffliner/node_modules/imagemagick/imagemagick.js:24:25)\n    at Object.exports.convert (/tmp/mwoffliner/node_modules/imagemagick/imagemagick.js:253:10)\n    at Promise (/tmp/mwoffliner/lib/mwoffliner.lib.js:417:47)\n    at new Promise (<anonymous>)\n    at resizeFavicon (/tmp/mwoffliner/lib/mwoffliner.lib.js:416:28)\n    at /tmp/mwoffliner/lib/mwoffliner.lib.js:433:28\n    at Generator.next (<anonymous>)\n    at /tmp/mwoffliner/lib/mwoffliner.lib.js:9:71\n    at new Promise (<anonymous>)\n    at __awaiter (/tmp/mwoffliner/lib/mwoffliner.lib.js:5:12)\n    at saveFavicon (/tmp/mwoffliner/lib/mwoffliner.lib.js:413:20)\n    at /tmp/mwoffliner/lib/mwoffliner.lib.js:331:23\n    at Generator.next (<anonymous>)\n    at fulfilled (/tmp/mwoffliner/lib/mwoffliner.lib.js:6:58)\n    at process._tickCallback (internal/process/next_tick.js:68:7)",
	"message": "spawn ENOMEM",
	"errno": "ENOMEM",
	"code": "ENOMEM",
	"syscall": "spawn"
}

@kelson42 kelson42 reopened this Sep 9, 2019
@kelson42
Copy link
Collaborator Author

Maybe using a lib like https://github.com/lovell/sharp would fix the pb?

@stale
Copy link

stale bot commented Nov 14, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale stale bot added the stale label Nov 14, 2019
@kelson42 kelson42 modified the milestones: 1.9-maintenance, 2.0 Apr 9, 2020
@stale stale bot removed stale labels Apr 9, 2020
@kelson42 kelson42 assigned kelson42 and unassigned ISNIT0 Apr 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants