Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory requirements of pdf2svg.js example to avoid OOM #8540

Merged
merged 2 commits into from
Jun 20, 2017

Conversation

Rob--W
Copy link
Member

@Rob--W Rob--W commented Jun 18, 2017

See the individual commit messages for more details. I verified that the output of the first 9 pages (before and after my patches) are identical:

$ gdb -ex run --args \
 node --max_old_space_size=200 \
 examples/node/pdf2svg.js /tmp/FatalProcessOutOfMemory.pdf

$ md5sum svgdump/FatalProcessOutOfMemory-[1-9].svg
8d3b9b5ff4bffef8423346f3b964a96f  svgdump/FatalProcessOutOfMemory-1.svg
0c00d37793523580bb45af64cf0b0510  svgdump/FatalProcessOutOfMemory-2.svg
2efeb696ec79a5c2e1318ee8d6c26d78  svgdump/FatalProcessOutOfMemory-3.svg
898f49bbda623137f5edc985bfafcac3  svgdump/FatalProcessOutOfMemory-4.svg
00628f7ebc47afbbc2992b85da7a0828  svgdump/FatalProcessOutOfMemory-5.svg
dec693597bc61286d10b4f0af8b468ea  svgdump/FatalProcessOutOfMemory-6.svg
6c047f815efd2816ace541a23487f48f  svgdump/FatalProcessOutOfMemory-7.svg
f702367a6b8903273a42277a12d40d2d  svgdump/FatalProcessOutOfMemory-8.svg
cd1867d8cb8b88b5d16e3c31b51ef2b5  svgdump/FatalProcessOutOfMemory-9.svg

(note that if you wait until the PDF-to-SVG conversion finishes for the given example from #8534, that you end up with a svgdump directory of 1.5G.)

Fixes #8534

@doublex
Copy link

doublex commented Jun 18, 2017

@Rob--W
Works great! Do you want $50 bounty?

@Rob--W
Copy link
Member Author

Rob--W commented Jun 19, 2017

@doublex Sure, why not. I already appreciate the good bug reports from you, thanks for those!

Copy link
Contributor

@yurydelendik yurydelendik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, that's is confusing. Less memory garbage is generated when buf.push(); and buf.join() is used instead of s += ... (that's true in other languages too e.g. Java/C#)

} else {
return '<' + this.nodeName + ' ' + attrList.join(' ') + '>' +
Copy link
Contributor

@yurydelendik yurydelendik Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking the issue was only here: we where doubling size of this.childNodes.join(''). I'm thinking that replacing str += with buf.push() and buf.join('') at the end will free more memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is both in the attributes and the childNodes list. When I first changed attrList to +=, I was able to get 9 additional pages, whereas doing the full conversion gave only one more (on top of the 9). I haven't tested childNodes in isolation, but that is probably not too relevant.

A JS string in V8 can internally be represented in multiple ways. One of the representations is a sequence of substrings, which is especially useful in use cases involving concatenation, like this one. This string will be flattened at some point (e.g. when you read individual characters of the string).

Another reason for the current state of the patch (besides the boost in (peak) memory usage) is that it becomes easier to implement svg-to-file streaming (like a readable stream). It is conceptually as simple as replacing += extraData with .write(extraData).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is conceptually as simple as replacing += extraData with .write(extraData)

Yeah, the extent of the current changes is good. I just want to replace str += to buf.push() verbatim, which will fit the agenda above.

One of the representations is a sequence of substrings...

I understand. But we can produce really large string and I don't want JS JITs to make the call how to handle this data internally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied the changes. Now only 19 pages are rendered before crashing (instead of 20 with +=), but the intent is more clear now.

Rob--W added 2 commits June 19, 2017 21:52
Test case:
Using the PDF file from mozilla#8534
node --max_old_space_size=200 examples/node/pdf2svg.js /tmp/FatalProcessOutOfMemory.pdf

Before this patch:
Node.js crashes due to OOM after processing 10 pages.

After this patch:
Node.js crashes due to OOM after processing 19 pages.
Wait for the completion of writing the generated SVG file before
processing the next page. This is to enable the garbage collector to
garbage-collect the (potentially large) SVG string before trying to
allocate memory again for the next page.

Note that since the PDF-to-SVG conversion is now sequential instead of
parallel, the time to generate all pages increases.

Test case:
node --max_old_space_size=200 examples/node/pdf2svg.js /tmp/FatalProcessOutOfMemory.pdf

Before this patch:
- Node.js crashes due to OOM after processing 20 pages.

After this patch:
- Node.js is able to convert all 203 PDFs to SVG without crashing.
@yurydelendik yurydelendik merged commit 9bed695 into mozilla:master Jun 20, 2017
@yurydelendik
Copy link
Contributor

Thank you for the patch.

movsb pushed a commit to movsb/pdf.js that referenced this pull request Jul 14, 2018
Reduce memory requirements of pdf2svg.js example to avoid OOM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants