Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak in http2 #21332

Closed
cTn-dev opened this issue Jun 14, 2018 · 14 comments
Closed

Possible memory leak in http2 #21332

cTn-dev opened this issue Jun 14, 2018 · 14 comments
Labels
http2 Issues or PRs related to the http2 subsystem. memory Issues and PRs related to the memory management or memory footprint.

Comments

@cTn-dev
Copy link

cTn-dev commented Jun 14, 2018

  • Version: v10.4.1
  • Platform: Windows 10 x64, Ubuntu 18.04 x64
  • Subsystem: http2

Hi, this is my first bug report for node, hopefully i get it right!

TLDR: http2session.request appears to be leaking memory outside of javascript (heap is clean, resident set size keeps growing).

I am attaching a rather crude test code which demonstrates the issue and is able to repro it every time.
The zip file attached contains client.js, server.js and "keys" folder containing self signed SSL keys.

http2test.zip

  1. start the server (it will be running on port 8080) node server.js
  2. start the client node --expose-gc client.js
  3. let the client run for a little while

Depending on the performance of your machine, you should see the "test output" in < 15 seconds or so.

What happens

  1. client will establish connection to the server and make 1 request
  2. after we get a reply we save current process.memoryUsage() for later comparison
  3. client will send 5 parallel requests 20 000 times to the server, 100k total
  4. we will trigger GC and wait for 10 seconds, then trigger GC again (i need heapTotal to shrink and this seems to work)
  5. we will grab the latest process.memoryUsage and calculate the delta for rss, heapTotal and heapUsed

You should get an output similar to this
Deltas: 0 KB Heap Total, -52 KB Heap Used, 10436 KB RSS

Feel free to play with the amount of requests you make, obviously higher number takes longer to execute but leaves much higher rss.

Here is a graph of rss, heap total and heap used over 24h on a system that does ~30k> requests per day, it has been running for several days (hence the larger rss).

24h

Sorry i wasn't able to narrow it down any further (took me over a month to trace the issue this far in my system as it kept leaking rather slowly and doing heap snapshot comparison never revealed anything).

@Trott
Copy link
Member

Trott commented Jun 14, 2018

@nodejs/http2

@Trott Trott added the http2 Issues or PRs related to the http2 subsystem. label Jun 14, 2018
@ryzokuken
Copy link
Contributor

Thanks @cTn-dev for the extremely comprehensive bug report, that helps.

/cc @addaleax maybe you know what's wrong here?

@addaleax
Copy link
Member

@ryzokuken Looked into it, but couldn’t find anything yet… It’s not #21336, that just popped up while investigating. :)

@ryzokuken
Copy link
Contributor

I wished it was related, and we had a fix ready :)
I'll try reproducing this first. Do you think we should also ping v8 peeps because this might be something with the GC (highly unlikely, but maybe).

@addaleax
Copy link
Member

@ryzokuken It’s an increase in RSS but not in heap size … it’s not what we’d expect from a typical leak in JS

Also, as I understand it, this only happens with http2?

@cTn-dev
Copy link
Author

cTn-dev commented Jun 15, 2018

@addaleax If you were referring to http/https, i haven't wrote a detailed test to have a look there, but after switching to https in my project the abnormal RSS increase completely went away, so i would say https is fine as of right now.

@apapirovski
Copy link
Member

I'll investigate over the weekend if this isn't fixed by then.

@addaleax
Copy link
Member

@cTn-dev I couldn’t find anything so far … while they don’t directly affect the results in your reproduction, could you check whether #21336 and/or #21373 solve the problem for you in production?

@ChALkeR ChALkeR added the memory Issues and PRs related to the memory management or memory footprint. label Jun 17, 2018
@ChALkeR
Copy link
Member

ChALkeR commented Jun 17, 2018

I couldn't reproduce the result under valgrind --tool=massif, perhaps I am doing something wrong?
Also I don't observe a linearly growing RSS with increasing the total requests count. This doesn't look like a memory leak to me.

I expect the inconsitencies in the observed RSS memory usage to be caused by the memory allocator here.

@ChALkeR
Copy link
Member

ChALkeR commented Jun 17, 2018

Moreover, calling gc() often (e.g. every 100 requests) seems to decrease the observed RSS change significantly. With total=100000, parallel=5, gc=100 it changes the RSS difference from ~35 MiB down to ~5 MiB. That works even with scavenges (gc(true));

My test code (client only, server is the same): https://gist.github.com/ChALkeR/aa392f5bb957279f0527e7f74e3b72da

I would say that something unoptimal is going on with memory allocation and garbage collection here, but this doesn't look like a memory leak. I.e. the memory is freed correctly, jut RSS doesn't decrease completely because of how memory allocator works.

@cTn-dev
Copy link
Author

cTn-dev commented Jun 17, 2018

@addaleax could you point me to a nightly build that contains these changes? (i don't really have much experience with compiling node nor have the environment setup for such).

I can test it in production to see if it will behave differently, i can definitely see #21336 being related as there are some "short" custom headers used.

Considering what @ChALkeR said, i am less sure about this being a leak in http2, if that's the case, i am really sorry for pointing you in the wrong direction :(

What initially put me of the "this is just how GC behaves" is the fact that observed RSS growth appears to completely ignore --max_old_space_size.

@ChALkeR
Copy link
Member

ChALkeR commented Jun 17, 2018

@cTn-dev

  1. RSS not going down is not always a sign of a memory leak, it could be caused by memory fragmentation. E.g., if you malloc (with glibc impl) 1 GiB worth of small strings in pure C, then free() them all except the last one, RSS would still stay at 1 GiB. That's because glibc uses sbrk for small allicatons which could not be shrinked back partially in that case. There are memory allocators that don't use brk, e.g. the OpenBSD malloc, but they pay for that by being slower or more memory consuming in some (likely common) cases.

  2. Based on running your testcase with valgrind massif I don't see a memleak (it uses it's own memory allocator to track things afaik), so I assume that the RSS not going down is caused by memory fragmentation.

  3. My testcase with frequent gc calls shows that those objects are indeed collected, and if gc() is called faster — the fragmentation seems to be lower because there is more freed memory to reuse instead of allocating new.

  4. RSS slowly growing might be indeed caused by the GC optimizations, but to my observations both 30k, 100k, and 1000k requests (with same concurrency=5) result in an identical increase of ~37 MiB, so this looks to be limited. Note: measured without frequent manual gc() calls.

  5. This indeed looks to be not affected by --max-old-space-size. Not sure if it should be, cc @addaleax? Perhaps V8 is not aware of some of the native http2 objects and is not applying proper gc pressure?

Note: I am not entirely sure if my analysis above is correct, let's wait for what @addaleax thinks on this. I might have missed something.

@cTn-dev
Copy link
Author

cTn-dev commented Jun 22, 2018

@addaleax Since #21336 made it to the 10.5.0 release i have been running tests on several of the production systems and its looking good.

I will make sure to put #21373 and the new custom memory allocator #21374 through testing once they make it to the next release.

When it comes to my original issue, this appears to be fully resolved in 10.5.0, yay \o/

I will keep this ticket open for now in case there is anything more to add here.

@apapirovski
Copy link
Member

@cTn-dev I might've misread but it seems like the issues have been resolved? If so, I will close this out but do feel free to reopen if I misunderstood or if you find more issues. I'm strictly doing this so it's easier for everyone to identify which issues still exist and to focus on fixing them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http2 Issues or PRs related to the http2 subsystem. memory Issues and PRs related to the memory management or memory footprint.
Projects
None yet
Development

No branches or pull requests

6 participants