-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lack of atomic.wait on the main thread seems limiting to a fault #106
Comments
The short answer is that browsers are "never" going to allow the main thread to wait. This really has little to do with jank (even if that was a concern early on) but is mainly about implementation reality; browsers use the main thread for all sorts of housekeeping tasks on behalf of other threads and allowing the main thread to be blocked by user content is a recipe for locking up the browser and/or breaking user programs. Specifically, the main thread may be required to do work on behalf of worker threads that are not themselves blocking from user content, but which are in fact blocking while waiting for the main thread to perform the work, and if the main thread is blocked in user content the worker threads will not be able to advance to the point where they can do the work that unblocks the main thread. We went over this a (large) number of times for JS and that's how it's going to be. We went over all the workarounds you propose, and found them wanting for that reason; it doesn't matter how you wait; it's that you wait that's the problem, since the main thread may not be able to do work that you think is concurrent while you're making the main thread wait. As a technical matter, JS and Wasm allow any thread to wait, but there is a flag on the agent, ie on the thread, called [[CanBlock]], available only to the embedder, that determines whether wait throws or not on that thread. Browsers set that to false for the main thread. See here.
By and large, they can't work on the browser's main thread, they have to be moved into a worker.
Not truly, though see below for a workaround that has some traction.
For main-thread code written in JS, there is a pattern ("asynchronous wait") that probably works, where you call
We sort of expect that main thread modules will use shared memory, because the main thread will be important for interacting with the DOM and the browser in general and it is therefore the main conduit for I/O with the browser, but the mechanisms by which these main thread modules synchronize will have to be something other than classical locks - as in the case of JS. Again, Spectre halted us in our tracks wrt exploring this territory. Up until then, we were moving to a state where entire apps were sequestered to clusters of workers, communicating on some kind of channel with the main thread. This channel can be in shared memory, provided the main thread can find a workable solution for synchronization; or it can be partly in shared memory (for the bulk of data) and partly with postMessage (for synchronization / wakeup); or we can try to come up with something better. With object support - just anyref is sufficient - in Wasm there's really no reason why Wasm can't call JS's Atomics.waitAsync directly and just return the object to JS, though of course that requires unwinding the wasm stack at this point - not exactly desirable. If we get something like JS's async into wasm, or more likely a basic notion of coroutine, then this changes fundamentally. Main thread code can waitAsync, then do a directed yield to a couroutine that will return to JS, passing the Promise object along with it; when the promise is resolved it can call back into Wasm, which does a directed yield back to the coroutine that blocked. It's not fast, but it may be adequate, and it works for the web (and current browser architectures). Another thing that we could envision is that the "main thread" in a wasm application is always some kind of coroutine that can block in the normal way (ie, it's actually a thread in the implementation), with directed yields to this coroutine on entry to wasm and directed yields back to the coroutine that represents the JS main thread on callouts to JS. This is not without peril but perhaps worth talking about. And finally, will Wasm threads be full Web Workers? Probably not. So this story is pretty open still. EDIT: Clarified some of the blue-sky discussion re coroutines. |
Thanks so much for taking the time to read and respond @lars-t-hansen! (and so quickly and thoroughly!). This definitely clears things up for me wrt the current state of affairs, I had no idea the main thread was so important for other assorted tasks! Also sorry about this but I definitely should have led with "I don't want to reopen any old wounds", I can only imagine the amount of debate about I did want to try to clarify one point though:
This makes sense to me! It's definitely the end state we'd like to land in for Rust (and I imagine other languages would like this as well) where wasm can drive DOM operations (and even quickly with host bindings!) as well as doing compute-heavy tasks externally in workers. I was almost totally sold on the "it's ok to have custom synchronization" point when I was working on a small raytracing demo with threads. Then I ran into an exception where the main thread executed This may actually be better titled "how are memory allocators supposed to work?" rather than This does mean, though, that the primary motivation for opening this issue, memory allocator synchronization, may have a more focused solution. I was talking with @tschneidereit this morning and it sounds like there's discussions for a "standard libray libc-like thing" for wasm which might be able to come with a memory allocator, and if implemented by the browser it could presumably implement synchronization safely (as it's known it wouldn't block the thread for too too long). Do you (or others?) have thoughts though on how to solve this in the near (or long?) term? Is there perhaps a convention we could shoehorn into most language runtimes to work well with the vision of "main thread I/O workers compute" while both can allocate memory? One idea we had was to just have the main thread spin loop waiting for the lock to be released (but all workers would (also FWIW I don't fully understand the coroutine idea, but it sounds quite promising!) |
Right, that has the same issue in that none of the browser's main thread work can execute.
One solution would be to give the main thread its own pool to allocate from. If this pool is ever exhausted, it can synchronously call |
Another way to dodge this issue is to run all user code in workers and use asynchronous or proxying interfaces to interact with Web APIs through a (non-allocating) shim that runs on the main thread. |
@binji it's true yeah we're still locking things up! I think my broader point is that given the fact that the main thread can't execute It's definitely possible to have thread-local allocators and not too hard to set up! That has the restriction, though, that by default you can't actually send the memory to other threads to get deallocated. It... may be possible though to architect an allocator like this? If each thread had its own allocator and could "deallocate" memory from any thread, we could semantically allocate memory without acquiring a lock and also free memory without acquiring a lock. Such an allocator would just mean that if you constantly allocate memory on the main thread and then free it on the worker threads you'd quickly run out of memory... @tlively yeah definitely! I was under the impression, though, that one of the goals of the threads+wasm proposal was to have a shared module on the main thread and worker threads. If that use case isn't desired then there's certainly no issue at all :). Right now it seems like that's the only feasible way to architect an app (all workers or only on the main thread). That model, however, is much more difficult to program against, I think, when you're an arbitrary library and you're trying to work within most applications that might you use (both threaded and not) |
Just some musings: could the allocators use Deallocation would now be asynchronous (though this fact is hidden from the programmer), but that's better than leaking memory, right? |
It's one thing is for the module to be shared; that doesn't mean the main thread and the worker threads have to run the same code, they "just" have to be compatible. (Once wasm threads are actual threads and not web workers the sharing will be a fact of life in any case.) Of course this is awkward but the reality of the web is that it is asynchronous and these concessions have to be made. I like the observation that this is in some sense more fundamentally about memory allocation than anything else. (Really, about managing any resource from a shared pool.) But it probably follows from the asynchronicity of the web that this management must at least in some ways be asynchronous. But you can sometimes choose where to put your asynchronous operations. Suppose, for example, you create an infallible allocator (used by all threads) that has a lock-free data structure over a set of size-segregated free lists so a lock won't normally be needed for allocation or deallocation; where you fall back to trying to take the heap lock to grow the heap when you can't allocate, and this will usually succeed for any thread; and where the main thread's fallback for failing to take the heap lock is to just execute Speaking of synchronization, a couple of points worth making. The performance of postMessage is usually fairly awful as it involves a lot of browser machinery. A mechanism like One weird aspect of an asynchronous event-driven design with promises is that a thread - the main thread for sure, but also all the others - represents an unbounded number of concurrently existing coroutines. By means of |
Oh fascinating, the usefulness of I also like your generalization of management of shared resources. While a memory allocator is probably one of the first issues to come up it's surely not the last! I'd definitely want to prove this out though to see how feasible it is to implement this strategy (particularly of an allocator). It's not totally clear to me yet if we can write an allocator which doesn't require intrusive support from the top-level application, but it seems more plausible than when I first opened this issue :) I'd be ok closing this issue for the topic of memory allocators in that it seems like there may be a viable path forward, and making progress seems like it requires at least some experimentation. I'm still somewhat worried about the implications of no It seems ok though to wait-and-see what happens here, whether it's actually a big problem in practice or whether a solution for memory allocators lends itself to a solution for other usages as well. One aspect that may not work is the main thread is empowered in memory allocation to fall back to |
I feel like this issue, and issues that stem from it, are the biggest blocker for bringing existing portable multi-threaded applications to wasm. Emscripten tries to paper over this issue by busy waiting on the main thread: https://github.com/kripken/emscripten/blob/incoming/src/library_pthread.js#L995 However, for real world applications, as @lars-t-hansen points out, it can lead to deadlocks. One solution that emscripten has for this is |
Firefox allows web workers to directly control canvases; maybe more APIs could be added along these lines to keep the entirely-in-worker setup simple and low-latency. Pre-existing multi-threaded codebases that are being ported to WASM probably don't expect to use the DOM much besides for writing to a canvas and reading inputs. Multi-threaded codebases that are written from scratch for heavy direct DOM usage (like rewrites of javascript front-end UI frameworks) can be architected from the start to deal with the lack of locks on the main thread. If the codebase is made with the restriction in mind from the start, it's probably easier to accomplish things like each thread having its own memory allocator or similar alternatives. |
This may be too off topic but one concern I'm having in this area is handling DOM events from a worker wasm context. In order to have cancellable events you need main thread synchronization and the only approach I can think of so far is to spin-lock in the JS event handlers. It gets tricky making even simple form handlers that disable buttons reliably during processing without that synchronization. I think this also applies to apis that require user gestures to perform. Edit: Actually I remember @developit commenting on a presentation which mentioned an experimental API called transferable events which may be the right direction for DOM APIs. Maybe purpose built apis like transferable events and some allocation api could be the approach over generic primitives like atomic.wait on the main thread. |
@rajsite Re user gesture requirement: I've been working on a framework that runs user code in a webworker, so I've experienced this first hand, and through some research I found this which I've confirmed solves the user-gesture requirement issue (tested in chrome with the experimental flag enabled). My biggest concern with disallowing |
@alexcrichton May a better way is provide a instrunction memory.alloc ? |
Ok after thinking this over, I personally think that we're in a good enough position that I'm going to cloes this issue. While this is still a general problem, I at least personally understand more of the story and it brings us back to a point of "there's a reasonably good story". Notably, the following points are what have changed my mind:
I think that the above points definitely don't put us into a "absolutely amazing" world where everything can just work, but it seems like a good balance between the constraints of the web with a reasonable enough story for synchronization of languages between the main thread and workers. Thanks all for the discussion! I suspect follow-up issues can always be opened if others are interested. Also @lygstate fwiw I think such an instruction is pretty difficult to provide, and we should be able to effectively do it with |
This issue is quite old, but I haven't found any new information yet... I am wondering why lock free allocators are not really discussed here. Those only rely on atomic operations like compare and swap, and as far as I understand, those operations are completely legal on the main thread. See mimalloc for example: It uses a thread local heap, i.e. each thread has its own heap. Allocation works on the thread local heap. Deallocation works from any thread:
|
5+ years later this issue is still pertinent and has had a negative impact on the Wasm ecosystem. Rust and emscripten, two of the most prominent ways to deploy Wasm on web, still both busy-loop instead of wait, which is strictly worse than if The effect this limitation has had on the Rust ecosystem is stark: partially due to the clunkiness of working around this constraint Wasm / Rust multithreading uptake has been slow and now nearly all Rust libraries bake in the assumption that WebAssembly is single-threaded. An example of how this hampers Rust is the popular library There is a library that adapts Popular projects like the Bevy game engine have not implemented multithreading on web, in part because underlying libraries (like rayon and others) aren't sure how to work around this constraint. This constraint has resulted in a Wasm ecosystem on web that's leaving significant performance on the table. |
I agree this is silly and harmful, in the same boat as the Web's restrictions on synchronous compilation in the JS API. Unfortunately, these restrictions are not necessarily by choice of the Wasm CG but primarily imposed by the groups in control of the Web platform. Hence it would require powerful lobbying to lift them. I wouldn't hold my breath. I'm afraid Wasm cannot fix the Web. |
@rossberg as an aside, some browsers are removing that synchronous compilation restriction: https://groups.google.com/a/chromium.org/g/blink-dev/c/nJw2zwaiJ2s/m/EYPgC5D3LwAJ So maybe if enough evidence can be shown, there's hope for a change some day. Wouldn't hold our breath though. |
@jayphelps, yeah well, lazy compilation adds a whole new bag of problems with regards to predictable performance and optimisation, especially for the use cases where synchronous compilation would matter most. So that's almost jumping from the frying pan into the fire. But we're off-topic now. |
As @rossberg said, it's unlikely that the situation will improve anytime soon, so for the foreseeable future if you want to do multi-threading then you need to run your Rust app off of the main thread. As long as all your Rust code is running in a Worker, then things like rayon work great. It does require some annoying extra setup, and it does make it harder to do main thread things (like the DOM), but it does work. And even with those hurdles, Rust is still the best option for doing multi-threaded Wasm on the web. We can try and do some things to improve the Rust experience (better tooling, better libraries, better docs), but the browser restriction around Workers is something we just can't fix. |
Is there a public record, beyond this issue, of this being discussed? Which parties would need to be persuaded? It's hard to imagine why someone would prefer to encourage an ecosystem of busy-loops and hacks over allowing very brief waits. |
While I totally understand the frustration here, I'm curious how higher level libraries such as the ones you mention (Bevy and rayon) suffer from the fact that rust (and emscripten) have to perform this busy-wait workaround on the main thread. Isn't that workaround buried deep in the standard library? How does it become observable to higher level libraries? Are you talking about the performance overhead of not being able to yield? |
The Rust standard library on Wasm does not currently busy-wait in its low-level primitives. That logic would go here: https://github.com/rust-lang/rust/blob/e6707df0de337976dce7577e68fc57adcd5e4842/library/std/src/sys/wasm/atomics/futex.rs#L13 Just the other day I proposed changing that: rust-lang/rust#77839 (comment) I suspect the reason a workaround using busy-wait has not been introduced yet is that the relevant Rust maintainers thought a better solution might come along and they didn't want to introduce any performance foot guns or 'hacky' code. Separately the Wasm Rust global allocator does work around this by busy-waiting, instead of using regular locking primitives. You can see this impact 'bubbling up' through higher level libraries, like this task scheduling library Bevy uses: Because the library authors were seemingly unfamiliar with web and unsure how to handle the main thread's inability to wait, they simply crash (even off the main thread!) if the lock can't be acquired. A solution, for just that library, would be to add its own busy-wait workaround. There are other cases of confused library authors throughout the Rust ecosystem who aren't sure what 'hacks' they should use to accommodate web. |
I don't believe the busy-wait workaround can be introduced at the So for this workaround to be applied in Rust Std, we would need a dedicated Web target. Just my two cents here: Additionally, the busy-loop workaround seems like a terrible idea to me that we should avoid if at all possible, unless we can come up with a nice solution how to put a time limit on that. My guess is that the busy-wait workaround for Wasm is born out of desperation. The problem with the busy-loop workaround is that it can't work entirely reliable on the Web, where e.g. we can't guarantee that things are properly dropped because of On the other hand I have no clue how these busy-loop workarounds are really implemented and if there are already good solutions to these problems I'm describing or not. AFAICS the problem boils down to not having a But I do believe that this is a problem we should and can solve in Wasm (instead of trying to allow blocking on the main thread). |
Can you elaborate on the reentrancy problem introduced by stack switching? Other than it is necessary to be able to support it (and JSPI does). |
For a proper EDIT: it was discussed before to try and address re-entrancy problems in Rust itself, which might be an option here as well ... |
It would need to be configurable. The Rust Std could expose a way to set a flag that enables the busy-loop behavior on a thread. This could be behind yet another feature-flag, or always enabled when the Creating another target seems like it'd be a larger maintenance burden, but it is an option. My concern with a
That would be bad! But acquiring a lock and then calling
I still would like to know more about why this is so futile. Long-running code has a similar effect to waiting on the main-thread, but that's permissible because it's necessary. At least in the Rust ecosystem most main-thread waits fall into two categories:
Almost always code using wait never intends to block the main thread for excessively long, and it's a bug if it does so. Just as it's a bug if code falls into an infinite loop or hits a pathological case that takes too long to compute. Main-thread waits can lead to poor behavior, just as loops can. Both are useful if used properly! |
Unfortunately there is currently no way to get around creating a new target. Even the On that note, the component model should hopefully allow Rust to create a dedicated target for the Web platform, this would solve many problems, e.g. Fact is, that unless Wasm converges on all kinds of topics to be equal on all platforms, we will need dedicated targets.
The JS-Promise integration proposal is already at phase 3, unlike a proposal to allow blocking on the main thread, which doesn't even exist. But my guess would be too that it would have some significant overhead. It would be really nice if we could busy-loop for just a moment until we give up and switch context, but I honestly don't know if there is some proper way to define "a moment". Maybe this would need a dedicated Wasm proposal after all, but yeah, sounds tough.
A library can't prevent a user from using But agreed, the status quo is pretty bad and we would have to make significant trade-offs with what we have right now to fix this, even if temporarily.
This is just my impression! I'm literally a nobody in this scene, it's not like I have any say here. On that note, it should be perfectly possible to make a Wasm post-processor that just replaces every call to |
I've started to experiment with wasm threads, Rust, and
wasm-bindgen
recently to see how well our story shapes up there. The good news is that it's all working pretty well! On basically every demo I've written so far, though, I've very quickly run up against the wall ofatomic.wait
instructions are not allowed on the main thread (they throw an error). I'm currently testing in Firefox, but I think this behavior is mirrored in other implementations?On the surface and in abstract the lack fo
atomic.wait
definitely makes sense. Reducing jank is always good! In practice, though, I've found this severly limiting when trying to write applications. The use case I'm exploring currently is to instantiate aWebAssebly.Instance
on the main thread, and thenpostMessage
that instance's module and its shared memory to a set of worker threads. That way the main wasm thread (the main application) can enjoy features like DOM access while the worker threads can do the workhorse of all the work. In this model, some gotchas arise pretty quickly.Most of the gotchas can be categorized as "it's really hard for libraries to avoid blocking synchronization". All code executed on the main thread, and all libraries it links to, can't use any form of blocking synchronization (like mutexes). Some cases where this come up quickly are:
Memory allocators - the Rust standard library provides a global memory allocator, for example, which is currently a translation of
dlmalloc
. To make this safe to use in a multithreaded scenario, access to the global allocator is synchronized with a mutex. (can't really imagine a world where memory allocation is asynchronous...). It's really hard for the main thread to entirely avoid allocating memory, or for sub-workers to all avoid allocating memory.Synchronizing messages - one of the first problems I ran into was accidentally attempting to lock memory to read it on the main thread. Without
atomic.wait
the only way (I think?) for a worker to synchronize with the main thread (aka wake it up to an event) is viapostMessage
. A worker (in abstract) doesn't even know if that'll wake up the main thread as well! (sub-workers and such).While it's not the worst thing in the world to provide custom synchronization at the app level, this makes me very wary to use any library that has synchronization at all on the main thread. If any library anywhere uses a mutex, even if just for a short period of time, it's not usable on the main thread as it may occasionally throw an exception.
Put another way, it seems like all existing threading-related libraries almost cannot be used by default. Even libraries that provide the ability to specify a custom method to send notifications are at risk of using a mutex for short periods of time to protect some data.
Putting this all together seems like it basically means that the entire main thread for an application has to be entirely user-written and use very few libraries (only those audited to be used on the main thread or saying they don't have synchronization). Even then, I'm not sure how the memory allocation issue would be solved. Additionally it seems like synchronization primitives will almost always have to be hand-rolled for each application, always using
postMessage
to communicate from the main thread to workers and back.Coming out of this is a few questions:
Ideally these problems could be solved by simply saying "
atomic.wait
is ok on the main thread", but that of course brings back the jank problem. Some of the possible solutions (like for the memory allocator problem) could be "just use a spin lock if it's short", but I'm not sure how that's better than just allowingatomic.wait
on the main thread? Maybe there's recourse for something like "you can useatomic.wait
only on the main thread if you specify a small timeout". For example it takes Firefox N seconds to say "your script is slowing the page down", could that be the maximum timeout foratomic.wait
?In general I'm also curious to hear others' thoughts on this as well. Is sharing a wasm module on the main thread with worker threads just a pipe dream? Are there other ways to work around this issue?
The text was updated successfully, but these errors were encountered: