Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async_hooks: AsyncLocalStorage to diagnostics_channel integration #45277

Closed
wants to merge 4 commits into from

Conversation

Qard
Copy link
Member

@Qard Qard commented Nov 2, 2022

This is an alternate and simplified way to express the integration between AsyncLocalStorage and diagnostics_channel attempted in #44894. I think this might be a better approach. What do other @nodejs/diagnostics folks think? I'll write up some docs for it soon, if this seems like a clearer, more coherent approach.

Depends on #44943

@Qard Qard added http Issues or PRs related to the http subsystem. async_hooks Issues and PRs related to the async hooks subsystem. diagnostics_channel Issues and PRs related to diagnostics channel labels Nov 2, 2022
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/http
  • @nodejs/net

@nodejs-github-bot nodejs-github-bot added the needs-ci PRs that need a full CI run. label Nov 2, 2022
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this significantly easier to understand! I have one clarifying question still: why do we need TracingChanel at all if most of this could be achieved by having a convention of .start and .end in diagnostic_channels? The various trace* methods could be implemented as static utilities.

lib/diagnostics_channel.js Outdated Show resolved Hide resolved
lib/diagnostics_channel.js Show resolved Hide resolved
lib/diagnostics_channel.js Show resolved Hide resolved
lib/diagnostics_channel.js Show resolved Hide resolved
ctx.result = result;
asyncEnd.publish(ctx);
return result;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These closures can be costly if there are no subscribers.

@@ -351,6 +353,49 @@ class AsyncLocalStorage {
return resource[this.kResourceStore];
}
}

bindToTracingChannel(channel, transform = (v) => v) {
if (typeof channel === 'string') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw if it is not a string? If passing a channel object is allowed should be somehow validate it?
Symbols as channel names are also allowed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will be adding some validation that if it's not a string it's a tracing channel, just didn't get that far yesterday. Also, while symbols are valid for regular channels, they currently are not for tracing channels due to string concatenation use.

@@ -1063,6 +1069,10 @@ function parserOnIncoming(server, socket, state, req, keepAlive) {
server.emit('request', req, res);
}

if (onRequestEndChannel.hasSubscribers) {
onRequestEndChannel.publish(dcMessage);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a try block is needed to ensure end is called even if an exception is thrown.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. And also should have the error channel to fully match tracing channel.

if (store.enabled) {
ctx[store.kResourceStore] = store.getStore();
}
store.enterWith(transform(ctx.data));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess transform requires exception handling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, yes.

unsubscribe,
Channel
Channel,
TracingChannel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it needed to export this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was going to use that for an instanceof check on the AsyncLocalStorage side, just hadn't got to that before I stopped yesterday.

@Flarna
Copy link
Member

Flarna commented Nov 2, 2022

Could you please provide some more background why this is needed in core? I'm sure you have a valid real world usecase in mind so it would be nice if you could share it here.

I'm also a bit concerned about hiding the use of AsyncLocalStorage#enterWith. To my understanding enterWith is still experimental because it's seen as unsafe. Now it's potentially called automagically via diagnostics channel. If the corresponding end event is not emitted (or emitted async or emitted in wrong order) the local store is in bad state.

This is quite problematic in my opinion because the DC publisher doesn't know/control if it's channels get bound to a local store. So they might be not aware at all about this implicit binding and a simple await added in their code can have significant side effects.

If consensus is that enterWith is unsafe and has to stay experimental because of this I would assume this is inherited by TracingChannel. We should have a path out of experimental here to avoid discussions like parts of #45084

@Qard
Copy link
Member Author

Qard commented Nov 4, 2022

The intent of this API is to avoid people using enterWith directly. Using it here it's in a controlled environment as the TracingChannel interface handles the event ordering so we can be sure of correctness. As for the use case, there's several reasons for this.

First off, we want AsyncLocalStorage and diagnostics_channel in other environments which necessitates more strictly defining a way to express scoping an AsyncLocalStorage run around trace events from diagnostics_channel. For example, in a Cloudflare worker we would like to have a storage scoped to a request. To do this we can use this interface and Cloudflare providing tracing channel events around the sync start and end of their request handler call. In Node.js core, we can wrap these events around http to scope to an http request.

Additionally we can use it to scope around any tracing channel scope, so an APM could do something like this to manage their span stack:

storage.bindToTracingChannel('mysql.query', data => {
  const parentSpan = storage.getStore()
  return new Span('mysql', data, parentSpan)
})

dc.subscribe('mysql.query.asyncEnd', () => {
  const span = storage.getStore()
  span.end()
})

This will set this span object as the current value of the store between the start and end events of the tracing channel and any descending async activity, and will call the end when the asyncEnd event is reached. Note that this will safely exit the storage scope when the end event happens rather than the current state where we leave the context entered until the end of the tick, which is the concern we have with enterWith currently.

As TracingChannel is designed currently, it is possible to manually emit on the individual named channels and produce out-of-order data, but that's part of why we're specifically trying to provide an API for this, to ensure that people don't do it wrong. Also, the binding is to the individual store and needs to be done explicitly, so I feel it's somewhat on the store owner to ensure they trust the correctness of the tracing channel they want to bind.

channel = tracingChannel(channel);
}

channel.traceSync(runner, { data });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took me a while to find the transformation to move the published start data into the .data property which is used in start() closure to call transform.

So it seems like a publisher should use this API from AsyncLocalStore to publish to ensure its TracingChannel fits to AsyncLocalStore. I know it's not strictly required as creating a context object with a data property can be also done in producer code.

But isn't the target to decouple the producing side using TracingChannel and consuming side subscribing and maybe bind to AsyncLocalStore?

@Flarna
Copy link
Member

Flarna commented Nov 8, 2022

As TracingChannel is designed currently, it is possible to manually emit on the individual named channels and produce out-of-order data, but that's part of why we're specifically trying to provide an API for this, to ensure that people don't do it wrong

Do you have already a path in mind to get rid of this possibility to produce out-of-order data?

I fully agree and support the new additions here and a lot use case have no need for enterWith anymore:

  • in your own code one can use run already now
  • via TracingChannel a producer signals start/end consistent and by binding a consumer has no longer the need to use enterWith

So the tools are there so only the (small) task left to transform the world to use it :o).

@Qard
Copy link
Member Author

Qard commented Nov 8, 2022

Not a specific plan yet, but working on some ideas. We'll see what I can come up with. 🤔

@Qard
Copy link
Member Author

Qard commented Nov 8, 2022

I added a new change which now omits enterWith entirely. This eliminates the storage breakage risk. Let me know what you think.

There's some additional things I want to try to make the tracing channel level safer too, but shouldn't matter here anymore. Also, while the binding here now does only depend on one channel, I would still like to keep this on the TracingChannel interface as the purpose is both to bind the store and to provide a convenient tracing wrapper so we don't need to do both things separately everywhere.

lib/async_hooks.js Outdated Show resolved Hide resolved
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work so far.

I'm still unconvinced that we need a new TracingChannel to achieve this vs a bunch of static utilities. We could save a few allocations of TracingChannel as it does not hold any state.

This would also be confusing for users because of the DiagnosticChannel singleton nature.

@Qard Qard force-pushed the als-diagnostics-channel-integration branch from fc22266 to 2ba33f3 Compare November 10, 2022 01:11
for (const key in handlers) {
const channel = this.#channels[key];
if (!channel || !channel.unsubscribe(handlers[key])) {
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why end the loop just because there is maybe an extra key in the object?

}

channel = new TracingChannel(name);
channel._weak = new WeakReference(channel);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
channel._weak = new WeakReference(channel);
channel._weak = tracingChannels[name] = new WeakReference(channel);

TracingChannel#_weak is a setter, there is no getter otherwise this is always undefined

};
}

static runInTracingChannel(channel, data, runner) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this needs thisArg, ...args to be forwarded to the runner via traceSync.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noticed that AsyncLocalStore#run has no support to provide thisArg...

@Flarna
Copy link
Member

Flarna commented Nov 14, 2022

AsyncLocalStorage.runInTracingChannel() calls now traceSync.
Do you plan to add similar functions for in AsyncLocalStorage which wrap traceCallback and tracePromise?

Which criteria do we define for producers to decide if they should call traceSync vs runInTracingChannel?

@@ -9,6 +9,7 @@ const {
FunctionPrototypeBind,
NumberIsSafeInteger,
ObjectDefineProperties,
ObjectPrototypeHasOwnProperty,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

});
} : undefined;

if (onRequestStartChannel.hasSubscribers) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this needs to be changed now to call AsyncLocalStore.runInTracingChannel

@Qard
Copy link
Member Author

Qard commented Dec 15, 2022

Merging the concepts of this into #44943.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async_hooks Issues and PRs related to the async hooks subsystem. diagnostics_channel Issues and PRs related to diagnostics channel http Issues or PRs related to the http subsystem. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants