Wasm workers #12833

juj · 2020-11-19T20:05:38Z

This PR adds a new "Wasm Workers" API to Emscripten.

The intent is to be an alternative to the pthread API, for cases where one wants to build on top of "directly exposed" Web Workers abstraction, instead of using the Pthread API from the POSIX world.

~~Currently still very WIP, but adding the code so far since we discussed this recently, so people can go through the code if interested. CC @kripken @sbc100 @dschuff.~~

Design goals:

minimal emitted code size (only pay code size for what you use) -> uses a maximally JS library based approach that DCEs well.
directly expose existing web constructs, complements the POSIX Pthreads API that approaches threading from the opposite portability perspective.
implement waitAsync-based synchronization primitives to avoid locking on the main thread. (though currently polyfills with a brutish setInterval polling since not yet available in browsers. CC @lars-t-hansen @syg)

Code sizes:

minimal runtime singlethreaded hello world: 1225 bytes
minimal runtime multithreaded wasm workers hello world: 2436 bytes
(current pthreads minimal runtime hello world: 43082 bytes, but we haven't really optimized that much for size)

Wasm Worker API consists of three major parts:

Worker control:

emscripten_wasm_worker_t emscripten_create_wasm_worker(void *stackLowestAddress, uint32_t stackSize);
void emscripten_terminate_wasm_worker(emscripten_wasm_worker_t id);
void emscripten_terminate_all_wasm_workers(void);
EM_BOOL emscripten_current_thread_is_wasm_worker(void);

PostMessages:

void emscripten_wasm_worker_post_function_v(emscripten_wasm_worker_t id, void (*funcPtr)(void));
void emscripten_wasm_worker_post_function_vi(emscripten_wasm_worker_t id, void (*funcPtr)(int), int arg0);
void emscripten_wasm_worker_post_function_vii(emscripten_wasm_worker_t id, void (*funcPtr)(int, int), int arg0, int arg1);
void emscripten_wasm_worker_post_function_viii(emscripten_wasm_worker_t id, void (*funcPtr)(int, int, int), int arg0, int arg1, int arg2);
void emscripten_wasm_worker_post_function_vd(emscripten_wasm_worker_t id, void (*funcPtr)(double), double arg0);
void emscripten_wasm_worker_post_function_vdd(emscripten_wasm_worker_t id, void (*funcPtr)(double, double), double arg0, double arg1);
void emscripten_wasm_worker_post_function_vddd(emscripten_wasm_worker_t id, void (*funcPtr)(double, double, double), double arg0, double arg1, double arg2);
void emscripten_wasm_worker_post_function_sig(emscripten_wasm_worker_t id, void *funcPtr, const char *sig, ...);

Synchronization Primitives API:

void emscripten_lock_init(emscripten_lock_t *lock);
EM_BOOL emscripten_lock_try_acquire(emscripten_lock_t *lock);
EM_BOOL emscripten_lock_wait_acquire(emscripten_lock_t *lock, double maxWaitMilliseconds);
void emscripten_lock_waitinf_acquire(emscripten_lock_t *lock);
ATOMICS_WAIT_RESULT_T emscripten_lock_async_acquire(emscripten_lock_t *lock,
                                                    void (*asyncWaitFinished)(volatile void *addr, uint32_t val, ATOMICS_WAIT_RESULT_T waitResult, void *userData),
                                                    void *userData,
                                                    double maxWaitMilliseconds);

in total: lock, spinlock, semaphore, condition variable, rwlock, wrlock, barrier.

Comparison between pthreads and Wasm Workers:

Both can use emscripten_atomic_{exchange, cas, load, store, fence, add, sub, and, or, xor} API.
Both types of threads have a local stack in wasm heap for execution.
Both types of threads can run both an infinite loop programming model and an event-based programming model.
Both can use EM_ASM and EM_JS API, executed in the Worker's own global scope.
Both can call out to library_*.js JavaScript library functions code that do not use proxying, executed in the Worker's global scope.

Only pthreads can run

sync and async proxied JS functions,
MAIN_THREAD_EM_ASM() code blocks

Wasm Workers cannot (currently at least)use any of the pthread_* API calls. Wasm Workers come with a custom Emscripten synchronization primitives API (pthreads could call into that sync primitives API)

Main thread cannot acquire synchronization variables directly: (unlike main thread can do with pthread_mutex_acquire)

emscripten_lock_wait_acquire: wasm worker only
emscripten_lock_waitinf_acquire: wasm worker only
emscripten_lock_try_acquire: main thread + wasm worker
emscripten_lock_async_acquire: main thread + wasm worker

Pthreads support cancellation points (pthread_cancel(), pthread_testcancel()). Wasm Workers do not have that concept.

Wasm workers have a topology, unlike pthreads.
-> Wasm workers cannot spawn Wasm workers in Safari (https://bugs.webkit.org/show_bug.cgi?id=22723)
-> a polyfill can be used: https://github.com/dmihal/Subworkers

To implement flat topology like with pthreads, proxy the Wasm Worker creation calls yourself.

Pthread Workers are pooled, and have two distinct states: hosting a live pthread vs dormant.
Wasm Workers do not have this concept.

Pooled pthreads support synchronous thread startup. Wasm Workers always start up asynchronously.

Pthreads have a thread main function. Wasm Workers do not.

Likewise, pthreads by default exit when falling off main. Wasm Workers exit only when parent thread calls emscripten_terminate_wasm_worker().

Pthreads have a Wasm-backed function call message queue to improve on the very slow performance of postMessage() in browsers and to implement a proxying bus.

Wasm Workers do not have a message queue (roll your own if needed)

It results that emscripten_set_x_callback_on_thread() functions do not function into Wasm Workers.

Pthreads synchronize the timing values of emscripten_get_now() across pthreads for consistent wallclock time between pthreads and main thread. Wasm Workers do not.

Pthreads have a concept of identity/thread ID (pthread_self()). ~~Wasm Workers: implement one yourself if needed~~ UPDATE: Wasm Workers have function emscripten_wasm_worker_self_id()
-> Pthread mutexes can be recursive (optional mutex init attribute), Wasm Worker lock cannot
-> Pthread mutexes are guarded to not be able to release a lock on behalf of another thread, Wasm Worker locks do not guard for this

Pthreads support pthread_tls_* API, C++11 thread_local, and Clang/GCC __thread keyword. ~~Wasm Workers do not.~~
UPDATE: Wasm Workers also support thread_local, __thread and _Thread_local, but not pthread_tls_* API.

~~To implement clean DCE-able TLS support with a

  int __attribute__((wasm_global)) threadIdx;

style of syntax for minimal code size, planning to use #12793~~

UPDATE: The above never really materialized, but not a big deal, since thread_local, __thread and _Thread_local are all supported, and also providing an example how to manually implement Wasm Globals based TLS variables.

Usage:
-s WASM_WORKERS=1 enables targeting Wasm Workers.
EMSCRIPTEN_PTHREADS will not be defined in C/C++ compilation units when targeting Wasm Workers.
UPDATE: EMSCRIPTEN_PTHREADS is defined when targeting Wasm Workers. However this should probably not be relied on, it would be good to change this in upstream LLVM/Clang to not be the case.

A new preprocessor #define EMSCRIPTEN_SHARED_MEMORY will be defined for both -pthread and -s WASM_WORKERS=1 builds.

SINGLE_FILE=1 is not supported with Wasm Workers (like it is not supported with pthreads)
LINKABLE, SIDE_MODULE and MAIN_MODULE=1, not supported (like it is not supported with pthreads either)
PROXY_TO_WORKER: Not supported (could be done, but needs a zero code size cost solution when not enabled)
-> Do note Safari nested Workers problem

Open questions:

~~USE_PTHREADS=1 + WASM_WORKERS=1 usage at the same time?~~ UPDATE: Yes, these are supported at the same time.
~~ALLOW_MEMORY_GROWTH=1 + WASM_WORKERS=1? Improved heap growth check in pthreads with memory growth enabled? #12783~~ UPDATE: Yes, supported.
~~smaller runtime memory usage with Wasm Workers by generating a custom .js file for them? (instead of reloading the main thread one?)~~ UPDATE: This is currently not viable and not pursued.

~~Planned~~ new strict emitted code size tests:

Hello World Wasm Workers
~~Hello OffscreenCanvas Wasm Workers~~

tlively · 2020-11-20T00:23:16Z

Neat!

Pthreads support pthread_tls_* API, C++11 thread_local, and Clang/GCC __thread keyword. Wasm Workers do not.

It should be trivial to support at least C++11 thread_local, and Clang/GCC __thread. Why not support at least those?

Also, what are the barriers preventing this from being a user-level library rather than a new Emscripten option? In an ideal world, this seems like the kind of thing users should be able to create and use without having to change Emscripten.

emcc.py

src/postamble_minimal.js

src/preamble_minimal.js

src/shell_minimal.js

tools/minimal_runtime_shell.py

tools/system_libs.py

juj · 2020-11-20T08:56:07Z

It should be trivial to support at least C++11 thread_local, and Clang/GCC __thread. Why not support at least those?

Very good point, let me clarify: the guiding philosophy here is the same as with MINIMAL_RUNTIME: "perfect DCEability" - if you don't use a certain feature, you should not pay any code size for it. Since we are a generic compiler that people use for so many different things, the intent is that we should only impose the minimal amount of undceable runtime on them.

That helps combat the perceived bloat that Emscripten has historically had a ballast from the POSIX compatibility support days, and it massively helps learnability/educatability, since when people unminify the generated code or wasm-dis the generated .wasm file, they will see only the things they added to the code and practically very little extra, making it easier to follow what the compiler is generating.

That being said, adding any features like C++11 thread_local on top of the base API would certainly be ok, as long as it follows this "zero code size when not used" rule.

C++11 is not something that I am initially concerned here, but the concern is more about "what does the absolute minimal required runtime for sharing a Module+Memory look like that won't get in anyone's way?"

Also, what are the barriers preventing this from being a user-level library rather than a new Emscripten option? In an ideal world, this seems like the kind of thing users should be able to create and use without having to change Emscripten.

You can practically observe what the obstacles are from this PR. Commented in this PR with (*) the different parts:

We need to enable targeting shared Memory generation and LLVM atomics (and bulk memory) , without bringing in the full pthreads runtime
We need to generate a startup bootstrap script for the Workers (people could be asked to put this together themselves separate to the build process I suppose, but it would be more clumsy to require it)
We need to adjust the startup runtime .js to behave differently in Workers compared to main thread: it should not initialize the global data ctors etc. that are only the role of the main thread.
We need a way to reuse existing provided shared memory, or allocate a new.
We need to create a shared Memory when targeting SAB+Atomics
We need a boolean bit of info to distinguish whether we are the main thread or the Worker so that we know how to route the different initialization based on this. (we could use var ENVIRONMENT_IS_WASM_WORKER = typeof importScripts !== 'undefined'; here, but eyeing towards enabling possible -s PROXY_TO_WORKER=1 use cases, this may be better - it's also smaller code size wise)
We should not launch main() in Workers.
We should adjust the download sequence so that the necessary .js files are kept in memory and loaded up by the Workers without pinging the remote server for each Worker startup. (that is dramatically slower e.g. for 16+ workers)
We should allow using atomics without the pthreads API

So basically parts of the compiler toolchain and core/built-in runtime code need to behave a bit differently when targeting this, that requires adding a new flag. The idea here is that the WASM_WORKERS=1 flag would add the minimal scaffolding that will enable people to achieve this.

kripken · 2020-11-23T17:24:14Z

I am in favor of things like this. But I think it should not be orthogonal to our pthreads support. That is, a new Web-friendly threads API makes a lot of sense, and our full-POSIX pthreads support should use it as much as possible. Obviously it can't do everything, but I suspect at least the worker pool could be done in C using this API, which would be a nice simplification for pthreads, and get useful testing for the new API. It would also prove that the API is useful.

juj · 2022-02-22T12:55:59Z

@kripken @sbc100

I've got this branch really close to landing, but one remaining issue that is bugging me is that the CI is unable to run the Wasm Worker tests.

When I locally test the branch on three different systems, on a Windows, Linux and a Mac computers, they each are able to run the tests fine. But on the CI, the Wasm Worker tests time out, and on some runs there is a logged error

test_wasm_worker_embedded (test_browser.browser) ... [0222/101629.679168:INFO:CONSOLE(71)] "got top level error: Uncaught ReferenceError: _emscripten_wasm_worker_initialize is not defined", source: http://localhost:8888/test.js (71)
[0222/101629.679424:INFO:CONSOLE(1379)] "Page threw an exception [object ErrorEvent]", source: http://localhost:8888/test.js (1379)
[0222/101629.679448:INFO:CONSOLE(2064)] "Uncaught ReferenceError: _emscripten_wasm_worker_initialize is not defined", source: http://localhost:8888/test.js (2064)
[0222/101629.679502:INFO:CONSOLE(2064)] "Uncaught ReferenceError: _emscripten_wasm_worker_initialize is not defined", source: http://localhost:8888/test.js (2064)
[client logging: /?exception=Uncaught ReferenceError: _emscripten_wasm_worker_initialize is not defined / undefined ]
[unresponsive tests: 1]
[test error (see below), automatically retrying]
Expected to find '/report_result?0
' in '[no http server activity]
', diff:

--- expected
+++ actual
@@ -1 +1 @@
-/report_result?0
+[no http server activity]

which is a really peculiar error to see, since the error suggests that the CI compiles the code differently than what I get locally.. emscripten_wasm_worker_initialize is included in the generated builds on all of my Windows, Linux and Mac systems, so I am scratching my head as to why only the CI would suddenly do builds that don't include that symbol?

I wonder if either of you would be able to catch why this would happen just on the CI?

sbc100 · 2022-02-22T15:22:36Z

Regarding the specific _emscripten_wasm_worker_initialize issue... do you know if that is coming from the main page or from a worker? I can't see how that could happen on the main page since EXPORTED_FUNCTIONS includes that symbol.

Did you know you can ssh into the CI bots try to debug stuff? There is "Re-run with SSH" button on the top left. If you can make the tests run under node that become more useful.

juj · 2022-02-22T16:43:36Z

that is coming from the main page or from a worker?

That is coming from the worker. Main thread does not execute that function.

Did you know you can ssh into the CI bots try to debug stuff? There is "Re-run with SSH" button on the top left. If you can make the tests run under node that become more useful.

Thanks, SSHd in there, and it looks like the CI just decides not to include the symbol.. here is a local build on the left, and a build on the CI on the right:

I find that there are multiple checkouts of Emscripten (one in /emsdk/upstream, and another in /project/) and multiple sysroot cache directories (one in /root/cache, another one in /cache/). That seems like a bug? It would be good not to have multiple checkouts on the CI?

However in the build itself, the file stack_limits.S in file libcompiler_rt-ww.a has not gotten built with the newly added symbol, hence the missing function.

If I clear FROZEN_CACHE in .emscripten, then do a emcc --clear-cache, and then reissue the build with

emcc tests/wasm_worker/hello_wasm_worker.c -o a.html -sWASM_WORKERS

then the missing symbol does appear in the build:

So I suspect this issue is something to do about the FROZEN_CACHE and/or embuilder mechanism to prepopulate the cache.

juj · 2022-02-22T16:51:38Z

Although the whole reason that I added a new function to stack_limits.S is that I was unable to reference the existing variables __stack_base and __stack_end otherwise. I don't know if there's a way in asm syntax (at least .extern did not work..)

I'd be happy to put that file in it own wasm_worker.S file if there's a way, that might sidestep this issue.

sbc100 · 2022-02-22T18:02:59Z

Although the whole reason that I added a new function to stack_limits.S is that I was unable to reference the existing variables __stack_base and __stack_end otherwise. I don't know if there's a way in asm syntax (at least .extern did not work..)

I'd be happy to put that file in it own wasm_worker.S file if there's a way, that might sidestep this issue.

I think you don't need direct access to __stack_base and __stack_end if you use the emscripten_stack_set_limits function?

emscripten_stack_set_limits is global (.globl in the asm syntax) whereas __stack_base and __stack_end are not marked as .globl (we can keep it that way I hope)

sbc100 · 2022-02-22T18:04:09Z

that is coming from the main page or from a worker?

That is coming from the worker. Main thread does not execute that function.

Did you know you can ssh into the CI bots try to debug stuff? There is "Re-run with SSH" button on the top left. If you can make the tests run under node that become more useful.

Thanks, SSHd in there, and it looks like the CI just decides not to include the symbol.. here is a local build on the left, and a build on the CI on the right:

I find that there are multiple checkouts of Emscripten (one in /emsdk/upstream, and another in /project/) and multiple sysroot cache directories (one in /root/cache, another one in /cache/). That seems like a bug? It would be good not to have multiple checkouts on the CI?

However in the build itself, the file stack_limits.S in file libcompiler_rt-ww.a has not gotten built with the newly added symbol, hence the missing function.

If I clear FROZEN_CACHE in .emscripten, then do a emcc --clear-cache, and then reissue the build with
emcc tests/wasm_worker/hello_wasm_worker.c -o a.html -sWASM_WORKERS
then the missing symbol does appear in the build:

So I suspect this issue is something to do about the FROZEN_CACHE and/or embuilder mechanism to prepopulate the cache.

Once you switch to REQUIRED_EXPORTS instead of EXPORTED_FUNCTIONS then wasm-ld will fail at link time and make this kind of issue easier to debug

juj · 2022-02-22T18:16:05Z

Ok, now I am able to reproduce locally. If I follow the same sequence of embuilder build commands that the CI does and then freeze the cache, then I get the same bad behavior on my Windows box. I should hopefully be able to diagnose this locally now.

sbc100 · 2022-02-22T18:18:27Z

Can you share the code you used for "(current pthreads minimal runtime hello world)". I'd be curious to re-run it today and see what size it is.

juj · 2022-02-23T00:15:51Z

Can you share the code you used for "(current pthreads minimal runtime hello world)". I'd be curious to re-run it today and see what size it is.

I think I did measured emcc tests/pthread/hello_thread.c -o a.html -pthread -sMINIMAL_RUNTIME --closure 1 -Oz -s ENVIRONMENT=web,worker or something along those lines.

juj · 2022-03-06T15:45:24Z

Maybe we should not include the stub pthread library at all when wasm workers are enabled. Then any usage of pthread functions would result in undefined symbols which I think is what we would want.

This is a good idea, but I'll leave this kind of work later, it looks like there are a lot more of musl that could be stripped away in this kind of scenario.

emcc.py

site/source/docs/api_reference/wasm_workers.rst

tools/system_libs.py

sbc100 · 2022-02-22T14:55:18Z

system/lib/compiler-rt/stack_limits.S

@@ -77,6 +77,35 @@ emscripten_stack_get_free:
  PTR.sub
  end_function

+#ifdef __EMSCRIPTEN_WASM_WORKERS__
+# TODO: Relocate the following to its own file wasm_worker.S, but need to figure out how to reference
+# __stack_base and __stack_end globals from a separate file as externs in order for that to work.


You can call emscripten_stack_set_limits from the other file to set both __stack_base and __stack_end without needing direct access to the symbols.

This structure is used intentionally/specifically to avoid doing multiple JS -> Wasm function calls at initialization. Just one is enough.

What I mean is that emscripten_wasm_worker_initialize can call emscripten_stack_set_limits. Thats just one native assembly function calling another. No JS involved.

Ah I see.. Well I don't think it is necessary to do that, that would take away from the compactness of that function call, which is packed tightly with the use of local.tee to initialize all the variables.

src/settings_internal.js

system/lib/wasm_worker/library_wasm_worker_stub.c

tests/test_browser.py

emcc.py

sbc100 · 2022-03-07T20:25:15Z

Oops, I didn't actually mean to click that approve button yet.

tlively · 2022-03-07T20:29:41Z

system/lib/wasm_worker/library_wasm_worker.c

+	emscripten_wasm_wait_i32(&addr, 0, nsecs);
+}
+
+void emscripten_lock_init(emscripten_lock_t *lock)


Instead of creating a whole new set of emscripten_* functions as alternatives to the pthread API, what if we wrote and linked in an alternative implementation of the pthread API? That would make it easier to port code to run inside a Wasm Worker.

If we wanted to improve the ability to port code to target wasm workers, I think the route to go about that is to implement a new pthread library implementation that backs on top of wasm workers, not one that would replace exposing wasm workers.

Not exposing a direct API here would be a detriment to the simplicity that is gained with exposing direct web primitives, and also to generated code size. When one talks through an existing native Posix API that does not match the web primitives, there is then unfortunately some API inefficiency and cognitive inefficiency that are unavoidable to happen.

The existing Pthread library is already all about the ease of porting. In the past decade we worked a lot to make it behave as 1:1 to native Pthreads as possible. Portability is therefore not a big target for Wasm Workers, but instead directly exposing what web offers is, and the simplicity and minimalness in code size that is gained as a result is.

So it could be an interesting experiment to see what a Wasm Workers based pthreads library would look like, and what would happen e.g. if pthread cancellation points model was removed from its operation altogether. But I think that kind of experiment should not affect how the wasm workers API surface area is publicized, but it is good to stand on its own.

juj · 2022-03-08T10:47:56Z

Oops, I didn't actually mean to click that approve button yet.

I see, how would you like to proceed? I think I resolved all the review points that you had put out before. Do you think there are substantially more?

I would be happy to continue iterating in the tree, and then features like #16449 can also proceed (which has been pending for a long time since #12502 ), but I suppose it depends on the extent of the remaining review?

allan-bonadio · 2023-02-22T02:58:09Z

Could somebody please kill or fix this page?
https://emscripten.org/docs/api_reference/wasm_workers.html

That 'Quick Example' is a disaster. wasm_worker.h, emscripten_malloc_wasm_worker(), and -sWASM_WORKERS don't even exist.

I've litterally spent hours and days trying to get wasm workers to work with enscriptem and my React app, going back and forth between emscripten_create_worker() and just creating a WW in JS, and trying to get a C++ function call to work. (My needs are simple - I really only need 1 call, then the thread spends all its time in C++ until page reload.)

sbc100 · 2023-02-22T14:14:08Z

Perhaps you are using an older version of emscripten that doesn't have wasm workers. What version are you using? (what does emcc --version say?).

WASM_WORKERS does exist in the latest version of emscripten. You can see it here:

emscripten/src/settings.js

Lines 1531 to 1535 in fbc5d5c

    
           // If true, enables support for Wasm Workers. Wasm Workers enable applications 
        
           // to create threads using a lightweight web-specific API that builds on top 
        
           // of Wasm SharedArrayBuffer + Atomics API. 
        
           // [compile+link] - affects user code at compile and system libraries at link. 
        
           var WASM_WORKERS = 0;

If you have a simple use case you might prefer to use pthreads which have been around a lot longer and is well tested. I would say that wasm workers is still a more of a niche use case.

allan-bonadio · 2023-02-23T01:23:20Z

well, I just pulled a few days ago, so emcc --version says
emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.7 (48a1620)

When I added '-sWASM_WORKERS' to my emcc script, I got this out:
emcc: error: Attempt to set a non-existent setting: 'WASM_WORKERS'

did you mean one of BUILD_AS_WORKER?
perhaps a typo in emcc's -sX=Y notation?
(see src/settings.js for valid values)

When I tried '--WASM_WORKERS' I got:
clang-15: error: unsupported option '--WASM_WORKERS'

-BUILD_AS_WORKER=1, I ended up using that. Says you need that for 'emscripten_create_worker()'. But I couldn't make a script that let me do emscripten in its space, so I took it out.

I've since moved on to --proxy-to-worker, and it seems to work. I do have one worker; now i just have to get it to run some code.

You should really try out the example at the top of https://emscripten.org/docs/api_reference/wasm_workers.html
Follow it exactly as it says. The example should also give the exact emcc command to use; we're beginners, after all.

sbc100 · 2023-02-23T01:49:03Z

The WASM_WORKERS feature (and setting) was added in this PR (#12833), which first landed in 3.1.8 (which was released about 11 months ago). Any version of emscripten prior to that won't work.

allan-bonadio · 2023-02-25T01:21:12Z

I get it. I'm running 3.1.7 . I'm pulling from https://github.com/emscripten-core/emsdk.git , is that not the latest?

sbc100 · 2023-02-25T01:27:11Z

If you run emsdk install latest in emsdk you should get 3.1.32.