Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cap max outstanding async ids that need to be garbage collected #33896

Closed
jason-codaio opened this issue Jun 15, 2020 · 10 comments
Closed

Cap max outstanding async ids that need to be garbage collected #33896

jason-codaio opened this issue Jun 15, 2020 · 10 comments
Labels
async_hooks Issues and PRs related to the async hooks subsystem.

Comments

@jason-codaio
Copy link

Is your feature request related to a problem? Please describe.

We have a particularly busy server that uses the popular continuation local storage library, which is implemented using the async hooks api to track the life cycle of async hooks. During high period of async hook generation we exhaust the max size of a Map 16777216 trying to track all outstanding async hook ids because Node has not garbage collected these yet. When I can repro the issue I can peak into the heap dump and see that the system destroy_async_id_list is almost exactly 8 bytes * max size of the map as all by 1-2 of the outstanding async_hooks can be garbage collected.

Note: Please correct me if my assumption is wrong about the usage of destroy_async_id_list and the fact that it contains all of the async ids means they are ready to be garbage collected.

Describe the solution you'd like
I know there are no guarantees on when garbage collection is run but it would be nice to force trigger a GC when the outstanding set of hooks gets this high. A reasonable limit might be 10 million async ids that need to be destroyed?

Without some guarantee a library trying to track async ids has to find a way gracefully degrade during these windows or work around some rather nasty constraints.

@bnoordhuis bnoordhuis added the async_hooks Issues and PRs related to the async hooks subsystem. label Jun 16, 2020
@bnoordhuis
Copy link
Member

Are you sure you don't have a resource leak in your application?

I can see a few thousand unreachable async_hooks objects stay around unreclaimed when there's not enough memory pressure for the garbage collector to come online but 16.7 million? Very unlikely.

If your answer is "yes, I'm sure", then do you perhaps have a small test case that demonstrates the issue? No third-party dependencies please, only built-in modules.

@jason-codaio
Copy link
Author

@bnoordhuis I'm fairly sure doesn't the destroy_async_id_list size prove that?

@bnoordhuis
Copy link
Member

No. If you have a resource leak of some kind (e.g., forgetting to close sockets after you're done with them), then those objects can't be reclaimed because they're considered live.

@jason-codaio
Copy link
Author

Okay, I guess I wasn't following the code correctly. I assumed if they had a live reference they wouldn't be placed on the destroy_async_id_list... I'm going to work on a repro later today and I'll get back to you. @bnoordhuis

@bnoordhuis
Copy link
Member

I assumed if they had a live reference they wouldn't be placed on the destroy_async_id_list

It depends on the type of resource. Some are placed on the list when they become unreachable, others through manual bookkeeping.

If it helps: destroy callbacks run asynchronously. If your program is a while (1) { /* ... */ }, then they won't get a chance to run and the list will keep growing.

@jason-codaio
Copy link
Author

jason-codaio commented Jun 17, 2020

@bnoordhuis just noodling on your last comment destroy callbacks run asynchronously does awaiting in itself not give an async yield that would allow for this to run? Or are you saying basically

  async function someASyncIsTrue(num) {
     if (false) {
       await request('...');
     }
    return true;
 }

  let allTrue = true;
  for (let i = 0; i < 20000000; i++) {
    const result = await someAsyncIsTrue(i);
    if (!result) {
      allTrue = false;
    }
}

Is guaranteed to end up with 20 million async hook references not garbage collected if the async function doesn't actually yield?

@jason-codaio
Copy link
Author

I confirmed in my reproduction that inserting await new Promise(resolve => setTimeout(resolve, 10)); into a critical part of our server loop causes the problem to be removed as the async_hooks get garbage collected during the timeout.

I guess our problem is that we are generating too many async_hooks that are actually being run synchronous... ugg.

@bnoordhuis
Copy link
Member

Is guaranteed to end up with 20 million async hook references not garbage collected if the async function doesn't actually yield?

Just to confirm, that's indeed what happens. You can see it in an example like this:

let tick = () => { console.log('tick'); if (tick) setImmediate(tick) }
async function go() { console.log('go') }
async function main() {
  tick()
  for (let i = 0; i < 100; i++) await go()
  tick = null
}
main()

That prints:

tick
go
go
# ...
go
go
tick

Promises are async but they're dispatched way more often than once per event look "tick", turning your and my example into what is still effectively a busy loop.

@addaleax Do you have ideas on how to improve that? Flushing env->destroy_async_id_list_ in AsyncWrap::EmitDestroy() when it gets too big isn't safe but maybe it could use isolate->EnqueueMicrotask() instead of env->SetImmediate()?

@jason-codaio
Copy link
Author

I just wanted to circle back. First @bnoordhuis thanks for the help!

We have some rather complicated graph traversal / calculation logic which can be CPU intensive, but some nodes (very few) can require IO. This forced us to make the entire system async as one async call can basically poison an entire system. Anyway the payload hitting this issue had no actual IO so the 16 million+ async calls were all synchronous and effectively locking the main thread. Adding a setImmediate yield at a tactical location in our traversal logic made sure we were yielding.

I realized this time bomb in our code was regressed when we moved from Bluebird promises to native promises as it was yielding for us.

Anyway I'm happy to close this as it is user error. I do find it unfortunate that a developer has to consider if the async calls will yield or not, and it would be nice if Node could find a clever solution to that. That said I realize that would probably invalidate constraints in the system is potentially not possible :)

@addaleax
Copy link
Member

@addaleax Do you have ideas on how to improve that? Flushing env->destroy_async_id_list_ in AsyncWrap::EmitDestroy() when it gets too big isn't safe but maybe it could use isolate->EnqueueMicrotask() instead of env->SetImmediate()?

I think we could use a second-pass GC callback? The current code was written under the assumption that GC will have a tendency to collect multiple resources at once and it’s easier to just handle all of them in one go, and we can handle destroy emits from GC and from explicit destroy calls the same way,

I don’t think using isolate->EnqueueMicrotask() would make a significant difference here, and from the sounds of it, not in the situation that this bug report is about anyway.

Flarna added a commit to dynatrace-oss-contrib/node that referenced this issue Jul 30, 2020
Use a microtask to call destroy hooks in case there are a lot queued
as immediate may be scheduled late in case of long running
promise chains.

Queuing a mircrotasks in GC context is not allowed therefore an
interrupt is triggered to do this in JS context as fast as possible.

fixes: nodejs#34328
refs: nodejs#33896
Flarna added a commit that referenced this issue Aug 2, 2020
Use a microtask to call destroy hooks in case there are a lot queued
as immediate may be scheduled late in case of long running
promise chains.

Queuing a mircrotasks in GC context is not allowed therefore an
interrupt is triggered to do this in JS context as fast as possible.

fixes: #34328
refs: #33896

PR-URL: #34342
Fixes: #34328
Refs: #33896
Reviewed-By: Gus Caplan <me@gus.host>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere pushed a commit that referenced this issue Aug 5, 2020
Use a microtask to call destroy hooks in case there are a lot queued
as immediate may be scheduled late in case of long running
promise chains.

Queuing a mircrotasks in GC context is not allowed therefore an
interrupt is triggered to do this in JS context as fast as possible.

fixes: #34328
refs: #33896

PR-URL: #34342
Fixes: #34328
Refs: #33896
Reviewed-By: Gus Caplan <me@gus.host>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
addaleax pushed a commit that referenced this issue Sep 22, 2020
Use a microtask to call destroy hooks in case there are a lot queued
as immediate may be scheduled late in case of long running
promise chains.

Queuing a mircrotasks in GC context is not allowed therefore an
interrupt is triggered to do this in JS context as fast as possible.

fixes: #34328
refs: #33896

PR-URL: #34342
Fixes: #34328
Refs: #33896
Reviewed-By: Gus Caplan <me@gus.host>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
addaleax pushed a commit that referenced this issue Sep 22, 2020
Use a microtask to call destroy hooks in case there are a lot queued
as immediate may be scheduled late in case of long running
promise chains.

Queuing a mircrotasks in GC context is not allowed therefore an
interrupt is triggered to do this in JS context as fast as possible.

fixes: #34328
refs: #33896

PR-URL: #34342
Fixes: #34328
Refs: #33896
Reviewed-By: Gus Caplan <me@gus.host>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async_hooks Issues and PRs related to the async hooks subsystem.
Projects
None yet
Development

No branches or pull requests

3 participants