Cap max outstanding async ids that need to be garbage collected #33896

jason-codaio · 2020-06-15T20:17:44Z

Is your feature request related to a problem? Please describe.

We have a particularly busy server that uses the popular continuation local storage library, which is implemented using the async hooks api to track the life cycle of async hooks. During high period of async hook generation we exhaust the max size of a Map 16777216 trying to track all outstanding async hook ids because Node has not garbage collected these yet. When I can repro the issue I can peak into the heap dump and see that the system destroy_async_id_list is almost exactly 8 bytes * max size of the map as all by 1-2 of the outstanding async_hooks can be garbage collected.

Note: Please correct me if my assumption is wrong about the usage of destroy_async_id_list and the fact that it contains all of the async ids means they are ready to be garbage collected.

Describe the solution you'd like
I know there are no guarantees on when garbage collection is run but it would be nice to force trigger a GC when the outstanding set of hooks gets this high. A reasonable limit might be 10 million async ids that need to be destroyed?

Without some guarantee a library trying to track async ids has to find a way gracefully degrade during these windows or work around some rather nasty constraints.

The text was updated successfully, but these errors were encountered:

bnoordhuis · 2020-06-16T09:36:24Z

Are you sure you don't have a resource leak in your application?

I can see a few thousand unreachable async_hooks objects stay around unreclaimed when there's not enough memory pressure for the garbage collector to come online but 16.7 million? Very unlikely.

If your answer is "yes, I'm sure", then do you perhaps have a small test case that demonstrates the issue? No third-party dependencies please, only built-in modules.

jason-codaio · 2020-06-16T15:48:36Z

@bnoordhuis I'm fairly sure doesn't the destroy_async_id_list size prove that?

bnoordhuis · 2020-06-16T17:27:07Z

No. If you have a resource leak of some kind (e.g., forgetting to close sockets after you're done with them), then those objects can't be reclaimed because they're considered live.

jason-codaio · 2020-06-16T17:49:04Z

Okay, I guess I wasn't following the code correctly. I assumed if they had a live reference they wouldn't be placed on the destroy_async_id_list... I'm going to work on a repro later today and I'll get back to you. @bnoordhuis

bnoordhuis · 2020-06-16T18:47:17Z

I assumed if they had a live reference they wouldn't be placed on the destroy_async_id_list

It depends on the type of resource. Some are placed on the list when they become unreachable, others through manual bookkeeping.

If it helps: destroy callbacks run asynchronously. If your program is a while (1) { /* ... */ }, then they won't get a chance to run and the list will keep growing.

jason-codaio · 2020-06-17T21:30:35Z

@bnoordhuis just noodling on your last comment destroy callbacks run asynchronously does awaiting in itself not give an async yield that would allow for this to run? Or are you saying basically

  async function someASyncIsTrue(num) {
     if (false) {
       await request('...');
     }
    return true;
 }

  let allTrue = true;
  for (let i = 0; i < 20000000; i++) {
    const result = await someAsyncIsTrue(i);
    if (!result) {
      allTrue = false;
    }
}

Is guaranteed to end up with 20 million async hook references not garbage collected if the async function doesn't actually yield?

jason-codaio · 2020-06-17T22:20:42Z

I confirmed in my reproduction that inserting await new Promise(resolve => setTimeout(resolve, 10)); into a critical part of our server loop causes the problem to be removed as the async_hooks get garbage collected during the timeout.

I guess our problem is that we are generating too many async_hooks that are actually being run synchronous... ugg.

bnoordhuis · 2020-06-18T07:41:52Z

Is guaranteed to end up with 20 million async hook references not garbage collected if the async function doesn't actually yield?

Just to confirm, that's indeed what happens. You can see it in an example like this:

let tick = () => { console.log('tick'); if (tick) setImmediate(tick) }
async function go() { console.log('go') }
async function main() {
  tick()
  for (let i = 0; i < 100; i++) await go()
  tick = null
}
main()

That prints:

tick
go
go
# ...
go
go
tick

Promises are async but they're dispatched way more often than once per event look "tick", turning your and my example into what is still effectively a busy loop.

@addaleax Do you have ideas on how to improve that? Flushing env->destroy_async_id_list_ in AsyncWrap::EmitDestroy() when it gets too big isn't safe but maybe it could use isolate->EnqueueMicrotask() instead of env->SetImmediate()?

jason-codaio · 2020-06-18T16:01:14Z

I just wanted to circle back. First @bnoordhuis thanks for the help!

We have some rather complicated graph traversal / calculation logic which can be CPU intensive, but some nodes (very few) can require IO. This forced us to make the entire system async as one async call can basically poison an entire system. Anyway the payload hitting this issue had no actual IO so the 16 million+ async calls were all synchronous and effectively locking the main thread. Adding a setImmediate yield at a tactical location in our traversal logic made sure we were yielding.

I realized this time bomb in our code was regressed when we moved from Bluebird promises to native promises as it was yielding for us.

Anyway I'm happy to close this as it is user error. I do find it unfortunate that a developer has to consider if the async calls will yield or not, and it would be nice if Node could find a clever solution to that. That said I realize that would probably invalidate constraints in the system is potentially not possible :)

addaleax · 2020-06-19T14:19:04Z

@addaleax Do you have ideas on how to improve that? Flushing env->destroy_async_id_list_ in AsyncWrap::EmitDestroy() when it gets too big isn't safe but maybe it could use isolate->EnqueueMicrotask() instead of env->SetImmediate()?

I think we could use a second-pass GC callback? The current code was written under the assumption that GC will have a tendency to collect multiple resources at once and it’s easier to just handle all of them in one go, and we can handle destroy emits from GC and from explicit destroy calls the same way,

I don’t think using isolate->EnqueueMicrotask() would make a significant difference here, and from the sounds of it, not in the situation that this bug report is about anyway.

Use a microtask to call destroy hooks in case there are a lot queued as immediate may be scheduled late in case of long running promise chains. Queuing a mircrotasks in GC context is not allowed therefore an interrupt is triggered to do this in JS context as fast as possible. fixes: nodejs#34328 refs: nodejs#33896

Use a microtask to call destroy hooks in case there are a lot queued as immediate may be scheduled late in case of long running promise chains. Queuing a mircrotasks in GC context is not allowed therefore an interrupt is triggered to do this in JS context as fast as possible. fixes: #34328 refs: #33896 PR-URL: #34342 Fixes: #34328 Refs: #33896 Reviewed-By: Gus Caplan <me@gus.host> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>

bnoordhuis added the async_hooks Issues and PRs related to the async hooks subsystem. label Jun 16, 2020

jason-codaio closed this as completed Jun 22, 2020

bnoordhuis mentioned this issue Jul 13, 2020

Memory leak on Promise.all with async/await #34328

Closed

Flarna mentioned this issue Jul 13, 2020

async_hooks: execute destroy hooks earlier #34342

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cap max outstanding async ids that need to be garbage collected #33896

Cap max outstanding async ids that need to be garbage collected #33896

jason-codaio commented Jun 15, 2020

bnoordhuis commented Jun 16, 2020

jason-codaio commented Jun 16, 2020

bnoordhuis commented Jun 16, 2020

jason-codaio commented Jun 16, 2020

bnoordhuis commented Jun 16, 2020

jason-codaio commented Jun 17, 2020 •

edited

Loading

jason-codaio commented Jun 17, 2020

bnoordhuis commented Jun 18, 2020

jason-codaio commented Jun 18, 2020

addaleax commented Jun 19, 2020

Cap max outstanding async ids that need to be garbage collected #33896

Cap max outstanding async ids that need to be garbage collected #33896

Comments

jason-codaio commented Jun 15, 2020

bnoordhuis commented Jun 16, 2020

jason-codaio commented Jun 16, 2020

bnoordhuis commented Jun 16, 2020

jason-codaio commented Jun 16, 2020

bnoordhuis commented Jun 16, 2020

jason-codaio commented Jun 17, 2020 • edited Loading

jason-codaio commented Jun 17, 2020

bnoordhuis commented Jun 18, 2020

jason-codaio commented Jun 18, 2020

addaleax commented Jun 19, 2020

jason-codaio commented Jun 17, 2020 •

edited

Loading