-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making thread-local functions viable #46
Comments
In case it was missed, #42 did include a sketch for supporting shared continuations calling into local JS using the shared-barrier and context locals.
How is But maybe you could restrict to shared results? Still seems slow though.
For builtins like
wrt. the 'weak form' here, I am still skeptical that this is viable. This means that toolchains must manually 'root' all their thread-local functions so that they're always reachable through unshared references. We also need to come up with some definition of what refs are shared/unshared (do instance imports count?). And it has also sounded like some engines wish to move to the strong semantics (which would be allowed by the weak semantics) at which point that engines that only implement the weak semantics are at high compat risk of breakage. When it comes to 'thread bound' references, are the 'weak' semantics acceptable for the use-case in #37 (holding a DOM node alive by a shared object)? That would require rooting every DOM node referenced in a |
@eqrion, would you be ok with the strong shared-to-unshared GC semantics with no guarantee of cycle collection? (Or perhaps a promise of no cycle collection for compatibility?) This would be equivalent to the semantics of supporting shared-to-unshared edges in I would strongly prefer that semantics to the weak semantics. That would be enough to let us have thread-local functions callable as normal (shared) imports and thread-bound data to let shared functions pass arbitrary JS objects around as |
Thinking about the difference between I haven't had time to consider it fully, but I also liked the sound of what @eqrion suggested in #42 for solving problem (2) because, while (2) may be showing up acutely in a JS context right now, it seems like the problem is not limited to JS and could come up in a pure wasm setting. E.g., let's say I create N unshared module instances in N Web Workers, and I have them all import 1 shared module instance, and I want the shared module instance to be able to call back into the calling unshared module instance (which is really just restating problem (2) in pure-wasm terms). It's hard to say how valuable this pure-wasm scenario is, but giving wasm expressive parity does seem generally good if it's not insanely complicated. But lastly, if we're exploring alternative solutions to (2) that work at the JS API level to avoid adding two kinds of |
I think I'd mentally lumped this into a "really add a context local feature" bucket which I'm still not sure about. I do like the idea that at each resume point you just naturally have to reconstruct the current environment/thread's context. One could almost think of the context locals as the only bit of state that isn't captured by a continuation. This interpretation makes them meaningfully distinct from function arguments and regular locals. Taking this last point further, instead of saying that resuming the shared continuation requires the reconstruction of the unshared state, could we say that resuming any continuation must inherit the current context at the resumption point wholesale? The context shape would be part of the type of the function/continuation, just like regular function arguments, with a static error if the context at the resumption point doesn't have the right matching shape. This is like fixing an ABI that must be respected in order to interact with captured continuations. We could make switching contexts imply a suspend barrier to facilitate this. With this, there's a minimal design that gets us JS interaction that assumes all add context locals with just a
I think we'd have to put our Wasm 1.0 hats on, and assume that any functions called in this way can't meaningfully pass references back into
I'm gesturing at The reason I think
I think I don't have a clear picture of how this would work. If the outer JS function wrapper appears simply as |
I could be wrong here and @eqrion please tell me if so, but my understanding of context-local storage in #42 (which I understood as an excellent type-y generalization of the stack-local storage I suggested in #34) is that context-local storage is mostly like function arguments and locals in that the storage does follow the stack around, just like params/locals. It's only the special case of
Maybe I'm missing the nuance you're getting at, but I thought that the key difference we discussed in the meeting was, when you attempt to use a
Given that you're already familiar with them, I think the best way to understand the JS API approach I was suggesting in #34 is in terms of algebraic effects: calling from JS into wasm would install a "call-unshared" effect handler, then we introduce a JS built-in that can be imported by wasm as a (Now that I talk through this, though, it does really seem like something we could just as well do in core wasm; there's nothing JS-y about it.) |
I think this fits the sketches from previous issues. What I'm proposing is essentially an alternative design that runs with this "special case" - quote:
This means that we no longer need to reason about context locals being captured (edit: and thus a In practice, I think toolchain ABIs would use a fixed list of JS functions for their context in all shared Wasm functions, so continuations will "just work". If this is a long list, maybe it makes sense to go with EDIT: and moving between ABIs is still possible through some |
Splitting this edit out into a separate comment as I think it's an important point: The behaviour I'm proposing is also more consistent with an interpretation of context locals as a version of thread-local storage that is only scoped to a Wasm call stack (i.e. allocated/determined at the JS->Wasm transition point). If you resume a continuation in another thread, you get the context locals for that thread's current call stack, instead of attempting to capture the suspendee's original "call-stack-local storage" which conceptually lives at the base of the stack and not in the portion that was suspended. |
Ah hah, I see it now, and I really like it. So, iiuc, your proposal implies a simple GC stack root and thus avoids all the GC complexities. It also seems efficient to implement (no extra anything on indirect or import calls). If that's right, it would seem to check all the boxes. Would you like to name your new proposal (to distinguish it from the last N iterations)? |
We could call this semantics "call-stack-locals"? I actually like the "context locals" name more (if we have the EDIT: let's just call them "call-stack-locals" so long as they're one of several options |
Yes, "context locals" does sound like the right name; perhaps we could call your proposal "context locals variant 2" in the short-term and drop the "variant 2" if we go with it once the dust settles. |
It's worth noting that your point here
is fair. I think this is manageable with the trick @tlively laid out previously - if you want such a value to be captured by the continuation, just read it into a stack slot or local variable before suspending, and then "restore" it upon resumption. In general I think the ABI will need to coordinate carefully on how the shadow-stack pointer is managed in the presence of shared continuations no matter what specific semantics we pick. |
I have to think about that a bit. I'm not sure how that would be specified or communicated to users. As Luke said above, the FinalizationRegistry isn't really an edge, just a callback, so its inability to collect cycles makes sense (and users are likely able to reason through that). Whereas here, we'd have something that is an edge in the graph, but we'd just be saying that cycles through it are not collectible. Re: call-stack-locals, that seems like another interesting way to do it. It may be cleaner than specifying some ad-hoc 'shared context locals gets defaulted on suspend'.
I wrote/said this somewhere else, but don't want it lost. One engine difficulty for TLS globals is that if the engine is pinning the TLS block to a register (as expected for performance), when resuming a shared continuation we need to iterate over all the stack frames we're resuming to lookup the new TLS blocks for the new thread. I don't think this will be cheap, it's at least a linear cost. The other alternative would be to re-lookup TLS blocks whenever leaving from a module crossing (in addition to entering a new module), but that adds cost even if you're not suspending. But then, if it's expected that toolchains will just undo all of this by setting the SSP to the value it had on the old thread, all of that work is wasted. |
The thing that gives me most hope is that it appears this design variant actually lets us stick with just the Iterating a little further - it probably helps V8's desired inlining optimisations if at least some context locals can be marked immutable. This would mean that code could be speculatively optimised to assume a particular context (and hence particular JS EDIT: and maybe on the JS side you'd want to eagerly bind (at least the immutable parts of) a context to a function during thread setup in order to make this test particularly fast |
One thing I'm trying to work through mentally WRT context locals is reentrancy. Is it ever expected that one might want to have a call graph that looks like |
Yes, this pattern is extremely common in Emscripten. |
I've thought about this a bit more and here's some more thinking on this. I think some form of 'strong' semantics (a shared-to-unshared edge can keep an unshared thing alive even if the shared thing is only reachable from a different thread) is necessary for the With 'weak' semantics, toolchains would have to have the unshared thread bound value be rooted in some unshared JS value in addition to the shared data structure, which would then be equivalent (from a lifetime perspective) to just storing a handle in the shared value that references the rooted unshared value and having manual memory management. At the cost of GC engine support that I don' think would give you much ergonomic advantage over using handles. With the 'strong' semantics that do not have a guarantee of cycle collection (or promise no cycle collection), my concern is about how this appears to users. If we require developers to use The above is mostly in reference to |
I agree that weak semantics are no better than storing rooted handles like linear memory languages must do today. I don't think that is sufficient to meet the needs of languages targeting WasmGC. That leaves us with strong semantics with or without cycle collection. I agree that there are risks of surprising memory leaks if we don't support cycle collection, so I would prefer to support cycle collection if we can get away with it technically. |
The state of things
There are two key problems that we absolutely must solve to express any reasonable compilation scheme using this proposal.
Discussion recap
The initial draft of this proposal aimed to solve (1) with thread-local globals, and (2) with thread-local functions. However concerns around implementation feasibility were raised about both of these (#34, #42).
In #42 @eqrion proposed a different approach, which (interpreted minimally) involves using function parameters/contexts to pass around a thread ID (allowing (1) to be handled programmatically) and a record of important JS functions in the current thread (allowing (2) to be handled through
ref.call
). The latter requires us to relax our interpretation of theshared
annotation to allownonshared
reference parameters.This alternate approach has implications for future shared continuations (work stealing) which I talk about here (#44). If we believe shared continuations will exist in the future, we will need both versions of
shared
at once -shared-fixed
(allows non-shared params) andshared-suspendable
(disallows as in our current semantics). The latter will need its own mechanism for solving (2), with all of the same constraints we're trying to avoid now. Moreover, we will need to work through the design for composingshared-fixed
andshared-suspendable
function calls, potentially requiring extra language features to facilitate this (such as theshared-barrier
discussed in the issue).Thesis
The more I work through the details of the above, the more I'm struck by the amount of future core language complication, and standardisation effort, that we could avoid if we can find an acceptable solution for (2) now that also works with the
shared-suspendable
semantics.Therefore, I believe we should redouble our efforts towards this end. If we find a solution, we avoid a lot of future mess. For now, I'm happy to consider (1) "minimally solved" through the thread ID-passing strategy.
Some possible ways forward
These are to spark discussion. I hope people can brainstorm variations of these, or other fresh ideas.
Eval in current realm
Introduce into the JS-API a new function, which I'll call
eval_realm
for simplicity, importable asshared
. This takes ashared externref
, interpreted as a string, and calls the current JS realm'seval
function on this string. As discussed with @syg, shared Wasm functions will need a per-realm prototype, so I think this per-realm dispatch can be semantically justified.Through compilation scheme wizardry (such as creating an initial table of meaningful strings), arbitrary JS access can be bootstrapped from this function, although it would be quite slow! To make key functions faster, this could be combined with the below strategy, with
eval_realm
used as a fall-back to invoke arbitrary JS.Make more JS builtins
shared
In the spirit of @eqrion's string-builtins, expose a larger list of functions (such as
Math.*
) that are importable asshared
. Possibly come up with some lightweight standards process to add additional functions. This has the bonus of clearly supporting inlining optimisations, but doesn't allow the execution of arbitrary user-defined JS, unless combined with the aboveeval_realm
strategy.Revisit thread-local functions
I still think a weak form (ref "flavor 2" here) of thread-local function could potentially be viable in the short-term. The semantics I envisage: the thread-local function's ephemeron in a thread is guaranteed not to be collected only so long as the thread-local function is rooted through purely non-shared references in the same thread. If the above ever becomes untrue, future calls to the function in this thread may non-deterministically trap.
There are two objections:
shared
thread-local function, and handle the keeping-alive of the underlying ephemeron correctly in this case. My semantics above allows some simplifying assumptions:I believe the objections above could possibly be overcome if our alternative solutions are unattractive or require greater implementation effort, and through comparison to the implementation-specific behaviours already exposed through
WeakRef
andFinalizationRegistry
.If there are orthogonal concerns about the implementation complexity of thread-local function
bind
, it's possible to remove this function, creating a restricted form of thread-local function that can only be called in the thread in which it is first created/bound. Note the similarities to @syg's sketch for thread-bound JS objects. To facilitate "cross-thread" calls of (e.g.)console.log
, the compilation scheme would need to create a thread-bound-wrappedconsole.log
in each thread, and use the thread ID to dispatch to the correct one (so each call site forconsole.log
in Wasm becomes a lookup in some table of thread-bound functions, based on the thread ID).Thread-bound and weak thread-local functions still give us a forward-compatible path towards the "ideal" semantics. The former can be accomplished by re-introducing
bind
. The latter, through interpreting the strong semantics as turning the non-deterministic successes/failures of the weak semantics into deterministic successes.The text was updated successfully, but these errors were encountered: