-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: mark AsyncResource and AsyncLocalStorage as stable #35286
Comments
cc @nodejs/tsc for awareness |
I'm extremely not comfortable with |
Domains were global, so If people are really so concerned about those specific parts of the API though, I'm happy to consider marking the rest of the API stable and just leaving those specific parts as experimental. Also, I'm less sure about the stability of AsyncLocalStorage than AsyncResource, given its relative newness in comparison, though I'm fairly confident at this point there are no significant issues remaining. |
@Qard The problem is just like how domain allows non-local and non-scoped manipulation of context in non-async boundaries. Your argument that only a "owner" will manipulate assumes that no-shared async storage would require that consumers have a specific coding style (in particular, not exposing a direct reference to an AsyncLocalStorage instance) which I don't find compelling. The docs show a local storage that is shared across multiple locations for example. |
The docs show an AsyncLocalStorage instance shared within the same user code, which is the intent. Sharing a context object with other modules or other peoples code in general is a bad idea and not encouraged. |
Agreed, but the concerns of sharing it with only |
The With Personally, I disagree that there is any issue with I entirely disagree that this is a repeat of the domains issues. Domains were problematic because they all shared one context globally--you can't be in more than one domain at a time. That's not the case here, each context is isolated. |
@Qard I do not agree with that last comment, hence me not wanting to move those methods out of experimental. We can have this discussion here if necessary but it likely would be more useful to have it elsewhere. Perhaps an explanation of why that specific API design which does have non-local affects is necessary would be more helpful. I'm still unclear on why APM is unable to use |
+1 on marking I would suggest adding a documentation section, regarding what node.js would in the future, when the other parts are stable, consider a major/minor/patch change to the async context graph. This would serve as a default guideline for module authors when they change the async context graph. PS: Be sure to also mark the C++ AsyncResource API stable as well. |
I would prefer to split those two things:
|
My initial thinking is along the same lines as @mcollina with leaving AsyncLocalStorage in experimental for a bit longer. |
Sounds good re: letting I do think we should make a more specific decision at least around when we should re-evaluate if |
@Qard agreed on thinking about the timeline for AsyncLocalStorage. For some of the other Experimental features we agreed on a set of exit criteria in terms of usage, etc. and that was useful in making the case when it was the right time to exit Experimental. |
@bmeck Here's a simple example of using a combination of const dc = require('diagnostics_channel')
const onHttpRequestStart = dc.channel('http.server.request.start')
const onMySQLQuery = dc.channel('mysql.query')
const als = new AsyncLocalStorage()
// If an incoming request is received, create a Trace object
// and set it as the context for the request.
// This channel message does not intercept the handler
// as it should not have access to mutate arguments. But it
// triggers this message handler before running `request`
// events, so it can be used to enter the context first.
onHttpRequestStart.subscribe(() => {
als.enterWith(new Trace())
})
onMySQLQuery.subscribe(() => {
const trace = als.getStore()
if (!trace) return
// Report the MySQL event to the trace
}) Note that to be able to start the als context for the http request, we would need to either use As for the comment about const app = express()
const als = new AsyncLocalStorage()
app.get('/user/:id', (req, res) => {
const requestData = {
userId: req.params.id
}
als.run(requestData, () => {
// A function from some other module
doAThing((err, user) => {
console.log(als.getStore().userId)
})
})
}) If function doAThing(cb) {
// BOOM. `requestData` is gone.
als.run({ some: 'data' }, cb)
} it will blow away the store entirely, because the two points are unaware of each other calling If function doAThing(cb) {
als.getStore().userId = 'some id'
cb()
} it will replace the tl;dr: Do not ever share a store with code you don't control. It's essentially the same concept as setting all your variables as globals. You're bound to run into a collision eventually and it's going to be hell to debug. Just don't do it. 😬 |
@Qard it is more complicated than that, consider: const als = new AsyncLocalStorage();
emitter.on('my-event', () => {
als.enterWith(store);
});
emitter.on('my-event', () => {
als.getStore();
});
ee.emit('my-event'); The problem with these bleed through APIs unlike ee.prependListener('my-event', () => {
als.getStore(); // gets the store BEFORE enterWith
}); Will escape this kind of implicit timing based side effect. ee.emit('my-event');
// AFTER
als.getStore(); // still has the store meant to span 'my-event' Will keep the store after the span it sought to encompass. If you combine the 2, you can get stale store values. I can imagine plenty more scenarios where this is true, but its pretty simple to see. I'm not against solving the problem of needing to span across calls, I am only uncomfortable with the current APIs. Per your example with |
That's intentional. It's setting the context for the rest of the sync tick, which all that is part of. That's actually specifically the functionality that is needed for using My APM agent can say I want the http request diagnostics_channel event to trigger |
@Qard I understand that is the design of the API, but disagree on the intent of the API being to be the API. The API is to solve the problem of wrapping around these contextual event spans. My example above is showing that the API is leaking outside of those spans. I understand how you can achieve some things with the APIs but I am not sold on the leakage being non-trivial. The problem that is attempting to be solved by those APIs seems to differ from the affects of the APIs. |
It's not leaking though. It's capturing exactly the span it's supposed to, as indicated in the docs which call out this exact behaviour. It is intentional that the context continues for the rest of the sync tick and not just a lexical scope. It is not trying to capture only the local scope of the function where it is placed, that's what |
@Qard it goes beyond the events and outside of the |
It's not going beyond the goal. It's achieving exactly the intended goal. That API was created by me, and I wrote it specifically for the purpose of pairing it with my now in-progress work on diagnostics_channel to be able to tell an AsyncLocalStorage to enter a context at the point that diagnostics_channel emits the request start event just before the server emits the request event. At that point, everything in the rest of the sync tick should be linked to that context, not just the current lexical scope. Rather than run() which says "make a new async scope and set its context to this" the enterWith() method is saying "for the current scope, set its context to this". |
@Qard It is unclear to me what you are talking about at this point, you are describing the API and not the problem it is trying to solve. I'm asking to understand why the API isn't considered to be leaking given that it doesn't appear to be wrapping the event but rather the event plus a long tail until the sync stack of anything unwinds. Repeating that you have achieve an intended goal is well and good, but it doesn't explain what the goal was, which I understand to be setting the store value to be something during the event so that it can be persisted. A more concrete example rephrasing my above comment: var { AsyncLocalStorage } = require('async_hooks');
var { EventEmitter } = require('events');
var als = new AsyncLocalStorage();
var ee = new EventEmitter();
var incr = ((i = 1) => () => i++)();
ee.on('my-event', () => {
var value = incr();
console.log('pre-entering: %s previous: %s', value, als.getStore());
als.enterWith(value);
});
ee.on('my-event', () => {
console.log('post-entering: %s', als.getStore());
});
function run() {
console.log('run');
ee.emit('my-event');
setTimeout(() => {
console.log('unrelated timeout got: %s', als.getStore());
}, 0);
}
run();
run(); You can see old values crossing into "previous:" and the timeout getting values even if it is outside the event. It would be good to understand why this is not problematic. I remain uncomfortable with those 2 APIs given examples like this one. |
An async server is a tree of spans of sync execution. This is modelled with async_hooks in the form of execution and trigger ids along with lifecycle events. In an app with nothing but a single set immediate, you will have the top-level sync tick execution span which, between the start and end of that sync tick, will emit an These before and after events will be linked to the init event through a shared async id, allowing one to traverse the async execution graph in id form and enabling correlation of anything between the before and after to be correlated to a point in that graph by using the current async id. All this complicated graph management is what The Being able to change the context object partway through the current tick is an intended feature and enables patch-free instrumentation. In the immediate term, it enables something like this: // Say I have a simple http server that makes a SQL query
// I want to create a trace for the request and attribute the
// mysql query to that trace...
const server = http.createServer((req, res) => {
mysql.query('...', (err, data) => {
res.json(data)
})
})
// ... a naive APM agent might do something like this:
const als = new AsyncLocalStorage()
// In a real-world scenario, we wouldn't have access to `server`
// so we'd have to patch the module loader to intercept the http
// when it is loaded and patch a large chunk of its API to basically
// just achieve what these three lines of code below are doing.
server.prependListener('request', () => {
als.enterWith(new Trace())
})
// A more real-world looking patch to collect spans from MySQL
// would look something like this...
const original = mysql.query
mysql.query = function wrappedMySQLQuery(query, callback) {
const trace = als.getStore()
const span = trace.startSpan('mysql', 'query')
return original.call(this, query, function wrappedMySQLQueryCallback(err, res) {
span.end()
return callback.call(this, err, res)
})
} With Anyway, my point is, the ability to change the context partway through a tick is an important feature for APMs in many places. The pre-http request example is the main one, but sometimes we need it for other types of servers. Another case is that APMs are encouraged to preload before app code via A ton of work has gone into getting these APIs to where they are now. Solving the problem of async execution graph tracking has literally been the core focus of my career for nearly a decade now, so I've put a ton of thought into it and built several APM companies agents around this model. While the |
Also, everything about your example is working exactly as intended. The context value is supposed to persist until the end of the tick so any async branches after that point will propagate that object into those async branches. If the object is changed midway through a tick, any future async inits in the remainder of that sync tick should propagate the new object. The "value" of the context between |
I'm +1 on promoting I would wait a little bit longer with
Regarding your discussion @bmeck @Qard FWIW .NET offers an |
I agree to this and do think we should have some way to solve it, but not to the points in my comments being non-trivial.
I have no real demands on the actual signature but it shouldn't be the end of the current tick that determines when it is changed. An API that matches the following would work fine to me: Something that persists until a synchronous termination before the end of the tick. APIs like Right now, the API to me bleeds past the intended synchronous call frame that it was hoping to wrap and I still don't see an explanation of WHY going past that call frame is intended and necessary, just explanations of that it does and it is intended. I am familiar with these APIs and async queueing, I don't think everyone needs to establish their level of familiarity here. TC39 also has had presentations on these issues which I have been a part of and I have had plenty of discussions on why AsyncLocalStorage style APIs are rejected there, most recently with Async Contexts. I am not trying to be as absolute as I would be in TC39 and do want to find a middle ground, but find that subset of the current APIs to be non-starters without clarification and exhaustive production usage from many sources showing no issues (either with development or runtime). |
It needs to be until the end of the tick though because the point is that we want to capture everything in that tick. An APM product wants to trace an entire request, not just part of it. At the start of the tick which runs the request handler until the end of it, we want to know everything that happens within that tick and within every async task descending from that tick so we can attribute that activity to the request that spawned it. If, for example, running the request handler was followed by running a setTimeout to timeout the request if it's not finished soon enough, we want to see that. We need our context object, which is probably our tracer, to be propagated into that so we have some way of finding the current trace within the descending async execution. I created enterWith() specifically because there are a few scenarios where we need to capture until the end of the tick. I was quite explicit in the documentation about warning exactly what this means. The suggested API for 90% of users is definitely the run() method, enterWith() is a special-case API. |
It's fine since the discussion is open :) |
Would it be possible to surface an als store through some events 🤔 ? Considering the example in this comment process.on('unhandledRejection', () => {
als.getStore() // undefined here
}) This would always be undefined. |
An unhandled rejection could come from anywhere, so no there's no way to bind that to an AsyncLocalStorage instance. The context in an event handler will be whatever the context was where the emit was called. In this case, it'd be in Node.js internals somewhere, not from the promise that the error originated from. |
@Qard should we remove from diag-agenda until we have a proposal for exit criteria? |
Sounds reasonable. :) |
@gireeshpunathil thanks a lot for the list of exit criterea, here is my take on it for the follwoing scope:
I am restricting to this scope as I believe this is a viable minimal API for AsyncLocalStorage that has no strong controverial points - there might be discussions about the DX of this scope but AFAIK there are no strong opposition against these methods and this constructor. This scope is, IMO, the most likely to reach consensus.
-> On the given scope, yes, the only debatable method is run and if we decide to make an alternative the mains canonical way of starting context spreading, there will always be a trivial way of building run over it.
-> I am not aware of any platform limitation for this scope
-> AFAIK there is no bug opened on the API per se. There is a know performance cost in promise-heavy environment but it should not be considered as a blocker: enabling AsyncLocalStorage is optional and the end user has the choice (directly or through packages they use) to pay that cost. However there are bugs linked to AsyncLocalStorage when ecosystem packages break context propagation (for instance here nodejs/help#3186). The solution is actually to move the API to stable to push adoption/compatibility forward and get rid of these bugs.
-> I believe there is no impact here
-> There are 15 test files for the AsyncLocalStorage API covering multiple use cases and situations. (All the ones which name start with
-> This one is probably the one where we probably can improve the most. I have spent a lot of time debugging context spreading issues and I am probably too used to this to be able to give a satisfactory response here. When we built this API, I had the feeling that it would mostly be used by APM/RASP providers. We have opened issues about the API in this repo and in nodejs/help that actualy prove me wrong on this (which actually makes me very happy). There is probably room for a debugging guide or at least a few debugging tips in the doc and I woul dbe happy to add them to the PR that would mark the scope as out of experimental.
-> There have been a few iterations on the doc and I believe it is clear enough as of right now.
-> It is hard to find numbers for this but the DataDog APM for Node.js actually uses this API which means there must be a large amount of Node.js live application using it through this module.
-> AFAIK @Qard is still working on JS-side promise hooks, which will affect all of this, but only as an implementation detail and we can move forward without blocking on it. This is a very personal analysis which has 2 major biases:
Thanks again @gireeshpunathil for providing this great framework. |
We are using AsyncLocalStorage in Kuzzle since more than 7 months and we didn't encounter any issue or any bad API design flaws. Despite we know our clients are using this feature in production since months, since we are developing a Node.js framework we can affirm that many other users are using AsyncLocalStorage without any inconvenient. 👍 for @vdeturckheim proposal to make this API stable so other products may want to take advantage of this feature as well |
That scoping seems fine to me. |
I agree with that scoping also. Nothing controversial in there to me, and all that has been battle-tested for months now in the Datadog APM agent. Any issues we've had have been with context loss due to |
+1 for making non-experimental. @vdeturckheim do you know when you might open the PR so I can keep an eye out for it ? |
@mhdawson I have a project ending early next week, so I planned to work on this in the next couple weeks (there might be some big doc updates to do) |
+1 for making non-experimental. |
@vdeturckheim thanks for the update and for doing the work to get it out of experimental. I think its a really important feature and appreciate you moving it forward. |
+1 for making non-experimental. We are applying it in a production environment. For the past two months I've been working with Referenced code: https://github.com/midwayjs/hooks/blob/ecc0b83a4fae5554cfbec8e61ca6ba74bc3f6135/packages/hooks-core/src/runtime/index.ts#L1-L23 We have over 1000 Midway full-stack applications in our company and will be upgrading to use the |
+1 for making this non-experimental. We at Airlift are also seriously considering it for our production environment. |
We stabilised a subset of the API that should be prod ready in #37675 :) |
@vdeturckheim I think maybe we should close this for now as "complete" and open a new one to stabilize other parts at a later time when that makes sense? |
We've stabilized what I wanted to see stabilized, so I'm going to call this done. Other parts of async_hooks, or more likely further higher-level APIs on top of it, can have their own stability timeline discussions elsewhere. |
The
AsyncResource
API has been largely unchanged since its creation many majors ago, andAsyncLocalStorage
has been out in the wild for awhile now and has proven its utility. I think both should be considered for stable status at this point. The challenge I see is that they live in theasync_hooks
module which has been marked as "experimental" at the top-level because theasync_hooks
feature itself exposes internals thus many (myself included) were unwilling to accept it as a "stable" API.What I suggest is that
AsyncResource
andAsyncLocalStorage
be considered as stable in the next major. Due to the module being named after the one not stable feature, it may also be worth moving the two APIs out to new top-level modules, but I'll leave that last point for discussion.How should we proceed in getting these two APIs to stable status?
cc @nodejs/diagnostics
The text was updated successfully, but these errors were encountered: