Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak #378

Closed
coreyweidenhammer opened this issue Nov 15, 2018 · 34 comments
Closed

Memory Leak #378

coreyweidenhammer opened this issue Nov 15, 2018 · 34 comments
Labels

Comments

@coreyweidenhammer
Copy link

Before I get into the issue, I just wanted to thank @diegomura and others for your work and responsiveness on this wonderful library! Your efforts are greatly appreciated.

OS:

  • Amazon Linux aws-elasticbeanstalk-amzn-2018.03.0.x86_64-nodejs-hvm-201809210902 (ami-00d787dad6adc69a1) on AWS Elastic Beanstalk
  • Also, macOS 10.13.6

React-pdf version:
1.0.0-alpha.24, 1.0.0-alpha.25
Node v8.9.3

Description:
We are consistently observing a memory leak, with the memory footprint of our node process growing linearly with each generated PDF until eventually exhausting the resources on our servers.

My team saw #165 (now closed), but we are uncertain as to whether what we are experiencing is related or not. As noted above, we have tried using both alpha.24 and alpha.25 with the same results.

We first noticed the memory leak on servers running in AWS via ElasticBeanstalk (Amazon Linux) but have since been able to recreate it on our local development environments running macOS 10.13.6. We attempted isolating the source of the leak ourselves but have not yet come up with anything conclusive. We're not even sure if the memory leak is (as before) related to a dependency vs the react-pdf code itself, but we are hoping you might be able to identify something more readily.

How to replicate issue:
We created a small project with limited dependencies that should allow you to recreate the issue as we see it. Zip attached here:
pdfGenerator.zip

There are instructions in server.js, repeated below:

RUN $ yarn / npm install and then run: $ npm run start-with-chrome-inspect

  1. In a Chrome browser window, go to http://localhost:3000/.
  2. In a new tab go to: chrome://inspect/#devices and click inspect under the target.
    In the devtools window, select the Memory tab at the top.
    Select 'Heap snapshot'.
    Click 'Take snapshot'.
  3. Open http://localhost:3000/pdf in a new tab.
  4. Go back the devtools window and click the dark circle icon at top-left above the Profiles pane to generate another snap.
  5. Repeat steps 4 and 5 repeatedly and you can see the retained memory increase. (JSArrayBufferData especially)

An alternate path to viewing the memory leak at a higher level is to monitor via pm2:
$ yarn / npm install pm2
$ pm2 start server.js
$ pm2 monit
-- proceed to access http://localhost:3000/pdf repeatedly
-- and watch the memory footprint grow.

Note that the memory footprint grows in proportion to the size of the PDF generated, so if you want to see it grow/leak more readily, you can create a larger PDF with more content, images, etc.

Thank you in advance for your help!

@diegomura
Copy link
Owner

Hey @coreyweidenhammer
Sorry I didn't respond earlier. Thanks for reporting this and the project to replicate it.

I don't know wat may be causing this to be honest. My first guess is that react-pdf caches assets in memory to avoid network request for the same item, but it has a limit, so the memory footprint should stop increasing eventually.

I'll try to find some time to look a it, but it's going to be hard to debug.

@diegomura
Copy link
Owner

Related to foliojs/pdfkit#258 ?

@kations
Copy link

kations commented Dec 20, 2018

i also experience memory leaks on client rendering but only on some devices

@kations
Copy link

kations commented Dec 20, 2018

img_0275

@kations
Copy link

kations commented Dec 20, 2018

It just happens on some client machines of different kind, Android, Mac etc. but I can't reproduce it myself.

@n6g7
Copy link

n6g7 commented Jan 9, 2019

tl;dr: Each render leaks PDFKit objects.


I've been running into this issue as well and put together a small reproduction gist: https://gist.github.com/n6g7/4d9d9af8eb5b943b61c89c1c6fa29a3b

When running this on my mac I get the data in this spreadsheet.
Basically the the example above leaks about 250kB per render in memory.

I also used node-heapdump to inspect the objects in memory and I noticed many StandardFont, Root, PDFDocument, PDFReference, etc objects that aren't being garbage collected. I think these are pdfkit classes.

I don't really understand what's happenign with pdfkit, its last release was 2 years ago (v0.8.3) but there's been a lot of activity since then on the repo, which doesn't appear to have been released. I assume this is why react-pdf uses a fork (@react-pdf/pdfkit).

Next step is to reproduce the leak using pdfkit only, could someone point to the source code of @react-pdf/pdfkit?

@diegomura
Copy link
Owner

Thanks for your input and work on this!
Sorry I cannot help much right now. I’m out of town until tomorrow. I’ll soon resume my work on this library.
About the @react-pdf/pdfkit source code, you can find it here

@Skywalker13
Copy link

Skywalker13 commented Jan 30, 2019

@kations the problem is just yoga-layout-prebuilt (dep of @react-pdf/renderer) which is an asm.js build of yoga-layout (it's not a memory leak). If you use electron or node, you can prefer to patch @react-pdf/renderer in order to use yoga-layout, then it will build a native node module instead. The TOTAL_MEMORY=134217728 limit is only for the asm.js build.

@n6g7
Copy link

n6g7 commented Jan 30, 2019

@Skywalker13 I get similar results with a patched version of @react-pdf/renderer (using yoga-layout instead of yoga-layout-prebuilt).
Could you elaborate on the changes you've made?

@Skywalker13
Copy link

Skywalker13 commented Jan 31, 2019

I suppose that you use react-pdf in frontend (browser side) and not in server side (with babel-node like me). In this case you can continue to use the asm.js version.

Look at the package.json of yoga-layout...

If I build for node by hand:

$ npm run build:node   

> yoga-layout@1.9.3 build:node /home/schroeterm/tmp/node_modules/yoga-layout
> npm run copy-sources && autogypi && node-gyp configure build


> yoga-layout@1.9.3 copy-sources /home/schroeterm/tmp/node_modules/yoga-layout
> ! npm -s run is-monolithic || (rsync -r --checksum --delete ../yoga/ sources/yoga/)

make : on entre dans le répertoire « /home/schroeterm/tmp/node_modules/yoga-layout/build »
  COPY Release/obj.target/nbind/geni/symbols.txt
  CXX(target) Release/obj.target/nbind/sources/yoga/Utils.o
  CXX(target) Release/obj.target/nbind/sources/yoga/YGEnums.o
  CXX(target) Release/obj.target/nbind/sources/yoga/YGNode.o
  CXX(target) Release/obj.target/nbind/sources/yoga/YGNodePrint.o
  CXX(target) Release/obj.target/nbind/sources/yoga/Yoga.o
BLABLABLA...
  SOLINK_MODULE(target) Release/obj.target/nbind.node
  COPY Release/nbind.node
make : on quitte le répertoire « /home/schroeterm/tmp/node_modules/yoga-layout/build »

I have a node module for nbind which is used by yoga-layout.

$ ls -la build/Release/
total 516
drwxr-xr-x 4 schroeterm schroeterm   4096 jan 31 07:08 .
drwxr-xr-x 3 schroeterm schroeterm   4096 jan 31 07:08 ..
drwxr-xr-x 3 schroeterm schroeterm   4096 jan 31 07:08 .deps
-rwxr-xr-x 1 schroeterm schroeterm 508360 jan 31 07:08 nbind.node
drwxr-xr-x 3 schroeterm schroeterm   4096 jan 31 07:08 obj.target

No more TOTAL_MEMORY problem because it's a native build. Of course, it can be used only on server side.

And if you want to build for the browser, you must install an emscripten environment and change the TOTAL_MEMORY to an higher value. For example:

diff --git a/javascript/final-flags.gypi b/javascript/final-flags.gypi
index dd13e827..d3e1ad26 100644
--- a/javascript/final-flags.gypi
+++ b/javascript/final-flags.gypi
@@ -19,7 +19,7 @@
             "ldflags": [
                 "--memory-init-file", "0",
                 "-s", "PRECISE_F32=1",
-                "-s", "TOTAL_MEMORY=134217728"
+                "-s", "TOTAL_MEMORY=268435456"
             ],
 
            "xcode_settings": {

And try to run npm run build:browser

@diegomura diegomura added the bug label Mar 20, 2019
@Hopding
Copy link

Hopding commented Mar 23, 2019

We’ve encountered these heap overflow errors as well. They happen reliably when we try to generate documents with a large number of pages (~70 pages), or when we render a large number of separate documents with a single page.

@Hopding
Copy link

Hopding commented Mar 23, 2019

@diegomura It looks like pdfkit has been updated with some fixes that might solve this memory leak problem: foliojs/pdfkit#258 (comment). How much work is involved in pulling these fixes into @react-pdf/pdfkit?

@Tketa
Copy link

Tketa commented Apr 5, 2019

Have encountered this issue as well.

Worked around it by moving the PDF generation to an AWS lambda for now. @diegomura do you need help pulling the changes in foliojs/pdfkit#258 (comment) to your own pdfkit? Quite new here but happy to help if you think it could solve the issue here

@diegomura
Copy link
Owner

Yes!! Help is much appreciated. Don't have much time to that yet @Tketa

@n6g7
Copy link

n6g7 commented Apr 5, 2019

@Tketa let me know if I can help!

@maxaggedon
Copy link
Contributor

maxaggedon commented Apr 8, 2019

@Tketa @diegomura @n6g7 I think this is critical to anyone using react-pdf in production, and it needs to go forward quickly. How do you think we should proceed ?

  • Someone opens up a PR to bring upstream changes into @react-pdf/pdfkit
  • We then resolve conflicts
  • Everyone reviews it, especially @diegomura
  • Everyone tests it in apps where they use react-pdf, and we use the gist from @n6g7 to test as well

How does that sound to you ?

@n6g7
Copy link

n6g7 commented Apr 8, 2019

@maxaggedon Sounds good to me, I'm quite busy this week so can't commit to opening the PR and resolving conflicts but will review and test in my projects for sure.

@diegomura
Copy link
Owner

Also sounds good to me. I'm also in the middle of something right now (react-pdf related also), so I'm not sure if I can commit to work on this right now. I agree with your plan and I promise I'll review code when needed

@Tketa
Copy link

Tketa commented Apr 8, 2019

Hey @maxaggedon, agree with the plan but as mentioned I fixed the issue with the serverless solution so not a priority for me this week, can work on it this weekend however 👍

@maxaggedon
Copy link
Contributor

know everyone is on board ! I will try to start the work during the week.

@maxaggedon
Copy link
Contributor

PR is open 👆!

@diegomura
Copy link
Owner

Thanks @maxaggedon !! You rock
I'll try to find some time this week to fully test that PR

@bhunjadi
Copy link
Contributor

bhunjadi commented May 2, 2019

I've tried to pinpoint where exactly is @n6g7 gist leaking and made a couple of changes to leaktest.js:

function go(n) {
  for (let i = 1; i <= n; i++) {
    if (i === 1 || i === 500) {
      heapdump.writeSnapshot('out/' + Date.now() + '.heapsnapshot');
    }

    const element = React.createElement(Component);

    // taken from pdf() function
    const container = createInstance({ type: 'ROOT' });
    const mountNode = PDFRenderer.createContainer(container);
    // leaks occur because of updateContainer
    PDFRenderer.updateContainer(element, mountNode, null);
    // leak even more
    // container.render();

    ...
  }
}

We don't even need to render() for leaks to occur, something in PDFRenderer.updateContainer is maybe leaking. In that case we have a leaked Root and Page instances in heapdump.
But I don't now anything about ReactFiberReconciler so could this be expected behaviour maybe?
@diegomura Any ideas?

[edit]
Looks like after using global.gc() just before writing snapshot everything is fine.

@bhunjadi
Copy link
Contributor

bhunjadi commented May 3, 2019

@diegomura
After doing some additional research, it looks like the problem is in the line
this.layout.setMeasureFunc(this.measureImage.bind(this));

This is in both Image and Text constructor.
Because of that text and image instances are stored in layout and can't be freed.

Somewhere in the code must be a call to this.layout.unsetMeasureFunc().

I think you were actually on the right path in this comment

Just to confirm that this is an issue I tried leaktest.js with modified render() of Text class:

async render() {
    // rendering code
    this.layout.unsetMeasureFunc();
  }

For a page with wrap: false this produced no leaks.
But for a page with wrap: true there were leaks just as before because for the cloned Text intances unsetMeasureFunc was not called.

The question is where should measureFunc be unset?

@n6g7
Copy link

n6g7 commented May 5, 2019

Amazing progress, thanks @bhunjadi!!

  • I was able to reproduce your investigation: with the wrap: false prop in <Page>, the following does not leak:
    diff --git a/src/elements/Text.js b/src/elements/Text.js
    index 67db03e..c9f42a0 100644
    --- a/src/elements/Text.js
    +++ b/src/elements/Text.js
    @@ -264,6 +264,7 @@ class Text extends Base {
        }
    
        this.root.instance.restore();
    +   this.layout.unsetMeasureFunc();
      }
    }
  • Adding a similar call to the clone method also fixes it for wrap: false:
    diff --git a/src/elements/Text.js b/src/elements/Text.js
    index 67db03e..5d17f83 100644
    --- a/src/elements/Text.js
    +++ b/src/elements/Text.js
    @@ -216,6 +216,7 @@ class Text extends Base {
        const text = super.clone();
    
        text.layoutEngine = this.layoutEngine;
    +   this.layout.unsetMeasureFunc();
    
        // Save calculated layout for non-dynamic clone elements
        if (this.blocks && !this.props.render) {

This looks like a very promising lead now. :)

@diegomura any ideas where would be the correct place to call unsetMeasureFunc?

@diegomura
Copy link
Owner

Thanks for this amazing research! Sorry I've been a bit unresponsive.

I guess a good place to call unsetMeasureFunc is either before or after rendering, both should work since at that stage every elements knows where to position. However, because how page wrapping works (recursively clone elements while they are being placed in pages), it's highly probable that unsetMeasureFunc should be called also somewhere else. But not really sure where yet without further analysis.

One of the main performance bottlenecks of this project is the page wrapping algorithm. As I said, in order to contemplate all cases, it clones and re-layout many times, which makes it very hard to debug and uses a lot of memory. I've been trying to think of a simpler solution but couldn't find one yet that supports all what we currently do. And there's no prior state of the art solution to this (that I could find).

@bhunjadi
Copy link
Contributor

bhunjadi commented May 6, 2019

I've added PR that looks like it is working.

Thanks @n6g7 for the clone hint for wrapped pages. In the end what I used in PR is cleanup in removeChild which seems to be working.

@coreyweidenhammer
Copy link
Author

Thank you to everyone who has worked on this!

My team is still wrapping up testing all of our scenarios, but preliminary results from v1.5.6 appear to confirm that the memory leak is indeed resolved. I'll post another update when we are finished.

@diegomura
Copy link
Owner

Thank you @coreyweidenhammer for testing! Please post any further details. Closing this issue 😄

@jeffijoe
Copy link

jeffijoe commented Jun 12, 2019

I'm on @react-pdf/renderer@^1.6.2 and having this problem it seems. My use case is exporting chat transcripts which can have a lot of messages, and after having exported around 700 PDFs, it stops doing anything (or panics with a heap out of memory error).

I tried using clinic to profile, and it indeed seems memory just keeps ramping up. Am I missing something? 😄


EDIT: Seems I'm having another problem, will open a new issue 😄

@oswaldoacauan
Copy link

Anyone got around of building for Yoga for node? We are running the latest v2 and we are facing everyday this issue.

@Skywalker13

@Skywalker13
Copy link

Patches exists for nbind and node v12, but it's annoying because it's necessary to fork yoga-layout in order to use an other nbind. And even in this case it's annoying because it breaks with electron 8 and in my case, I need yoga with node 12 and electron 8.

I've tried yoga-wasm but it's broken with react-pdf: rickbutton/yoga-wasm#4

And just now, I've discovered https://github.com/pinqy520/yoga-layout-wasm

I will try this one asap...

@oswaldoacauan
Copy link

@Skywalker13 any luck on your try outs? We have no solution for this so far other than manually restarting the process :(

@Skywalker13
Copy link

I tried a bit, it's not just a drop-in replacement. It's necessary to fork and adapt a little bit react-pdf (because the init is async with the wasm variants like yoga-layout-wasm), and maybe it doesn't work (like yoga-wasm), I don't know.

I've some success with react-pdf and yoga-layout built for some node versions but it's a bit annoying (for example Xcraft-Inc/yoga@a0039ce).

IMHO It's definitively react-pdf which should use something else...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests