-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory requirements of pdf2svg.js example to avoid OOM #8540
Conversation
@Rob--W |
@doublex Sure, why not. I already appreciate the good bug reports from you, thanks for those! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, that's is confusing. Less memory garbage is generated when buf.push();
and buf.join()
is used instead of s += ...
(that's true in other languages too e.g. Java/C#)
examples/node/domstubs.js
Outdated
} else { | ||
return '<' + this.nodeName + ' ' + attrList.join(' ') + '>' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking the issue was only here: we where doubling size of this.childNodes.join('')
. I'm thinking that replacing str +=
with buf.push()
and buf.join('')
at the end will free more memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is both in the attributes and the childNodes list. When I first changed attrList
to +=
, I was able to get 9 additional pages, whereas doing the full conversion gave only one more (on top of the 9). I haven't tested childNodes
in isolation, but that is probably not too relevant.
A JS string in V8 can internally be represented in multiple ways. One of the representations is a sequence of substrings, which is especially useful in use cases involving concatenation, like this one. This string will be flattened at some point (e.g. when you read individual characters of the string).
Another reason for the current state of the patch (besides the boost in (peak) memory usage) is that it becomes easier to implement svg-to-file streaming (like a readable stream). It is conceptually as simple as replacing += extraData
with .write(extraData)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is conceptually as simple as replacing += extraData with .write(extraData)
Yeah, the extent of the current changes is good. I just want to replace str +=
to buf.push()
verbatim, which will fit the agenda above.
One of the representations is a sequence of substrings...
I understand. But we can produce really large string and I don't want JS JITs to make the call how to handle this data internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I applied the changes. Now only 19 pages are rendered before crashing (instead of 20 with +=
), but the intent is more clear now.
Test case: Using the PDF file from mozilla#8534 node --max_old_space_size=200 examples/node/pdf2svg.js /tmp/FatalProcessOutOfMemory.pdf Before this patch: Node.js crashes due to OOM after processing 10 pages. After this patch: Node.js crashes due to OOM after processing 19 pages.
Wait for the completion of writing the generated SVG file before processing the next page. This is to enable the garbage collector to garbage-collect the (potentially large) SVG string before trying to allocate memory again for the next page. Note that since the PDF-to-SVG conversion is now sequential instead of parallel, the time to generate all pages increases. Test case: node --max_old_space_size=200 examples/node/pdf2svg.js /tmp/FatalProcessOutOfMemory.pdf Before this patch: - Node.js crashes due to OOM after processing 20 pages. After this patch: - Node.js is able to convert all 203 PDFs to SVG without crashing.
Thank you for the patch. |
Reduce memory requirements of pdf2svg.js example to avoid OOM
See the individual commit messages for more details. I verified that the output of the first 9 pages (before and after my patches) are identical:
(note that if you wait until the PDF-to-SVG conversion finishes for the given example from #8534, that you end up with a svgdump directory of 1.5G.)
Fixes #8534