-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wasm workers #12833
Wasm workers #12833
Conversation
Neat!
It should be trivial to support at least C++11 thread_local, and Clang/GCC __thread. Why not support at least those? Also, what are the barriers preventing this from being a user-level library rather than a new Emscripten option? In an ideal world, this seems like the kind of thing users should be able to create and use without having to change Emscripten. |
Very good point, let me clarify: the guiding philosophy here is the same as with MINIMAL_RUNTIME: "perfect DCEability" - if you don't use a certain feature, you should not pay any code size for it. Since we are a generic compiler that people use for so many different things, the intent is that we should only impose the minimal amount of undceable runtime on them. That helps combat the perceived bloat that Emscripten has historically had a ballast from the POSIX compatibility support days, and it massively helps learnability/educatability, since when people unminify the generated code or wasm-dis the generated .wasm file, they will see only the things they added to the code and practically very little extra, making it easier to follow what the compiler is generating. That being said, adding any features like C++11 thread_local on top of the base API would certainly be ok, as long as it follows this "zero code size when not used" rule. C++11 is not something that I am initially concerned here, but the concern is more about "what does the absolute minimal required runtime for sharing a Module+Memory look like that won't get in anyone's way?"
You can practically observe what the obstacles are from this PR. Commented in this PR with (*) the different parts:
So basically parts of the compiler toolchain and core/built-in runtime code need to behave a bit differently when targeting this, that requires adding a new flag. The idea here is that the WASM_WORKERS=1 flag would add the minimal scaffolding that will enable people to achieve this. |
I am in favor of things like this. But I think it should not be orthogonal to our pthreads support. That is, a new Web-friendly threads API makes a lot of sense, and our full-POSIX pthreads support should use it as much as possible. Obviously it can't do everything, but I suspect at least the worker pool could be done in C using this API, which would be a nice simplification for pthreads, and get useful testing for the new API. It would also prove that the API is useful. |
e518161
to
246211d
Compare
007d8d0
to
e163422
Compare
I've got this branch really close to landing, but one remaining issue that is bugging me is that the CI is unable to run the Wasm Worker tests. When I locally test the branch on three different systems, on a Windows, Linux and a Mac computers, they each are able to run the tests fine. But on the CI, the Wasm Worker tests time out, and on some runs there is a logged error
which is a really peculiar error to see, since the error suggests that the CI compiles the code differently than what I get locally.. I wonder if either of you would be able to catch why this would happen just on the CI? |
Regarding the specific Did you know you can ssh into the CI bots try to debug stuff? There is "Re-run with SSH" button on the top left. If you can make the tests run under node that become more useful. |
That is coming from the worker. Main thread does not execute that function.
Thanks, SSHd in there, and it looks like the CI just decides not to include the symbol.. here is a local build on the left, and a build on the CI on the right: I find that there are multiple checkouts of Emscripten (one in However in the build itself, the file If I clear FROZEN_CACHE in .emscripten, then do a
then the missing symbol does appear in the build: So I suspect this issue is something to do about the FROZEN_CACHE and/or embuilder mechanism to prepopulate the cache. |
Although the whole reason that I added a new function to stack_limits.S is that I was unable to reference the existing variables I'd be happy to put that file in it own |
I think you don't need direct access to __stack_base and __stack_end if you use the
|
Once you switch to |
Ok, now I am able to reproduce locally. If I follow the same sequence of embuilder build commands that the CI does and then freeze the cache, then I get the same bad behavior on my Windows box. I should hopefully be able to diagnose this locally now. |
Can you share the code you used for "(current pthreads minimal runtime hello world)". I'd be curious to re-run it today and see what size it is. |
I think I did measured |
This is a good idea, but I'll leave this kind of work later, it looks like there are a lot more of musl that could be stripped away in this kind of scenario. |
@@ -77,6 +77,35 @@ emscripten_stack_get_free: | |||
PTR.sub | |||
end_function | |||
|
|||
#ifdef __EMSCRIPTEN_WASM_WORKERS__ | |||
# TODO: Relocate the following to its own file wasm_worker.S, but need to figure out how to reference | |||
# __stack_base and __stack_end globals from a separate file as externs in order for that to work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can call emscripten_stack_set_limits
from the other file to set both __stack_base
and __stack_end
without needing direct access to the symbols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This structure is used intentionally/specifically to avoid doing multiple JS -> Wasm function calls at initialization. Just one is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that emscripten_wasm_worker_initialize
can call emscripten_stack_set_limits
. Thats just one native assembly function calling another. No JS involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see.. Well I don't think it is necessary to do that, that would take away from the compactness of that function call, which is packed tightly with the use of local.tee
to initialize all the variables.
Oops, I didn't actually mean to click that approve button yet. |
emscripten_wasm_wait_i32(&addr, 0, nsecs); | ||
} | ||
|
||
void emscripten_lock_init(emscripten_lock_t *lock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a whole new set of emscripten_*
functions as alternatives to the pthread API, what if we wrote and linked in an alternative implementation of the pthread API? That would make it easier to port code to run inside a Wasm Worker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanted to improve the ability to port code to target wasm workers, I think the route to go about that is to implement a new pthread library implementation that backs on top of wasm workers, not one that would replace exposing wasm workers.
Not exposing a direct API here would be a detriment to the simplicity that is gained with exposing direct web primitives, and also to generated code size. When one talks through an existing native Posix API that does not match the web primitives, there is then unfortunately some API inefficiency and cognitive inefficiency that are unavoidable to happen.
The existing Pthread library is already all about the ease of porting. In the past decade we worked a lot to make it behave as 1:1 to native Pthreads as possible. Portability is therefore not a big target for Wasm Workers, but instead directly exposing what web offers is, and the simplicity and minimalness in code size that is gained as a result is.
So it could be an interesting experiment to see what a Wasm Workers based pthreads library would look like, and what would happen e.g. if pthread cancellation points model was removed from its operation altogether. But I think that kind of experiment should not affect how the wasm workers API surface area is publicized, but it is good to stand on its own.
I see, how would you like to proceed? I think I resolved all the review points that you had put out before. Do you think there are substantially more? I would be happy to continue iterating in the tree, and then features like #16449 can also proceed (which has been pending for a long time since #12502 ), but I suppose it depends on the extent of the remaining review? |
Could somebody please kill or fix this page? That 'Quick Example' is a disaster. wasm_worker.h, emscripten_malloc_wasm_worker(), and -sWASM_WORKERS don't even exist. I've litterally spent hours and days trying to get wasm workers to work with enscriptem and my React app, going back and forth between emscripten_create_worker() and just creating a WW in JS, and trying to get a C++ function call to work. (My needs are simple - I really only need 1 call, then the thread spends all its time in C++ until page reload.) |
Perhaps you are using an older version of emscripten that doesn't have wasm workers. What version are you using? (what does
Lines 1531 to 1535 in fbc5d5c
If you have a simple use case you might prefer to use pthreads which have been around a lot longer and is well tested. I would say that wasm workers is still a more of a niche use case. |
well, I just pulled a few days ago, so emcc --version says When I added '-sWASM_WORKERS' to my emcc script, I got this out:
When I tried '--WASM_WORKERS' I got: -BUILD_AS_WORKER=1, I ended up using that. Says you need that for 'emscripten_create_worker()'. But I couldn't make a script that let me do emscripten in its space, so I took it out. I've since moved on to --proxy-to-worker, and it seems to work. I do have one worker; now i just have to get it to run some code. You should really try out the example at the top of https://emscripten.org/docs/api_reference/wasm_workers.html |
The WASM_WORKERS feature (and setting) was added in this PR (#12833), which first landed in 3.1.8 (which was released about 11 months ago). Any version of emscripten prior to that won't work. |
I get it. I'm running 3.1.7 . I'm pulling from https://github.com/emscripten-core/emsdk.git , is that not the latest? |
If you run |
This PR adds a new "Wasm Workers" API to Emscripten.
The intent is to be an alternative to the pthread API, for cases where one wants to build on top of "directly exposed" Web Workers abstraction, instead of using the Pthread API from the POSIX world.
Currently still very WIP, but adding the code so far since we discussed this recently, so people can go through the code if interested. CC @kripken @sbc100 @dschuff.Design goals:
setInterval
polling since not yet available in browsers. CC @lars-t-hansen @syg)Code sizes:
Wasm Worker API consists of three major parts:
Comparison between pthreads and Wasm Workers:
Both can use emscripten_atomic_{exchange, cas, load, store, fence, add, sub, and, or, xor} API.
Both types of threads have a local stack in wasm heap for execution.
Both types of threads can run both an infinite loop programming model and an event-based programming model.
Both can use EM_ASM and EM_JS API, executed in the Worker's own global scope.
Both can call out to library_*.js JavaScript library functions code that do not use proxying, executed in the Worker's global scope.
Only pthreads can run
Wasm Workers cannot (currently at least)use any of the pthread_* API calls. Wasm Workers come with a custom Emscripten synchronization primitives API (pthreads could call into that sync primitives API)
Main thread cannot acquire synchronization variables directly: (unlike main thread can do with pthread_mutex_acquire)
emscripten_lock_wait_acquire: wasm worker only
emscripten_lock_waitinf_acquire: wasm worker only
emscripten_lock_try_acquire: main thread + wasm worker
emscripten_lock_async_acquire: main thread + wasm worker
Pthreads support cancellation points (pthread_cancel(), pthread_testcancel()). Wasm Workers do not have that concept.
Wasm workers have a topology, unlike pthreads.
-> Wasm workers cannot spawn Wasm workers in Safari (https://bugs.webkit.org/show_bug.cgi?id=22723)
-> a polyfill can be used: https://github.com/dmihal/Subworkers
To implement flat topology like with pthreads, proxy the Wasm Worker creation calls yourself.
Pthread Workers are pooled, and have two distinct states: hosting a live pthread vs dormant.
Wasm Workers do not have this concept.
Pooled pthreads support synchronous thread startup. Wasm Workers always start up asynchronously.
Pthreads have a thread main function. Wasm Workers do not.
Likewise, pthreads by default exit when falling off main. Wasm Workers exit only when parent thread calls emscripten_terminate_wasm_worker().
Pthreads have a Wasm-backed function call message queue to improve on the very slow performance of postMessage() in browsers and to implement a proxying bus.
Wasm Workers do not have a message queue (roll your own if needed)
It results that emscripten_set_x_callback_on_thread() functions do not function into Wasm Workers.
Pthreads synchronize the timing values of emscripten_get_now() across pthreads for consistent wallclock time between pthreads and main thread. Wasm Workers do not.
Pthreads have a concept of identity/thread ID (pthread_self()).
Wasm Workers: implement one yourself if neededUPDATE: Wasm Workers have functionemscripten_wasm_worker_self_id()
-> Pthread mutexes can be recursive (optional mutex init attribute), Wasm Worker lock cannot
-> Pthread mutexes are guarded to not be able to release a lock on behalf of another thread, Wasm Worker locks do not guard for this
Pthreads support pthread_tls_* API, C++11 thread_local, and Clang/GCC __thread keyword.
Wasm Workers do not.UPDATE: Wasm Workers also support
thread_local
,__thread
and_Thread_local
, but notpthread_tls_*
API.~~To implement clean DCE-able TLS support with a
style of syntax for minimal code size, planning to use #12793~~
UPDATE: The above never really materialized, but not a big deal, since
thread_local
,__thread
and_Thread_local
are all supported, and also providing an example how to manually implement Wasm Globals based TLS variables.Usage:
-s WASM_WORKERS=1 enables targeting Wasm Workers.
EMSCRIPTEN_PTHREADS will not be defined in C/C++ compilation units when targeting Wasm Workers.UPDATE: EMSCRIPTEN_PTHREADS is defined when targeting Wasm Workers. However this should probably not be relied on, it would be good to change this in upstream LLVM/Clang to not be the case.
A new preprocessor #define EMSCRIPTEN_SHARED_MEMORY will be defined for both -pthread and -s WASM_WORKERS=1 builds.
SINGLE_FILE=1 is not supported with Wasm Workers (like it is not supported with pthreads)
LINKABLE, SIDE_MODULE and MAIN_MODULE=1, not supported (like it is not supported with pthreads either)
PROXY_TO_WORKER: Not supported (could be done, but needs a zero code size cost solution when not enabled)
-> Do note Safari nested Workers problem
Open questions:
USE_PTHREADS=1 + WASM_WORKERS=1 usage at the same time?UPDATE: Yes, these are supported at the same time.ALLOW_MEMORY_GROWTH=1 + WASM_WORKERS=1? Improved heap growth check in pthreads with memory growth enabled? #12783UPDATE: Yes, supported.smaller runtime memory usage with Wasm Workers by generating a custom .js file for them? (instead of reloading the main thread one?)UPDATE: This is currently not viable and not pursued.Plannednew strict emitted code size tests:Hello OffscreenCanvas Wasm Workers