Open
Description
Now that wasm2c has almost caught up to the current Wasm spec, maybe it's a good time to brainstorm about the roadmap from here and see what everything else thinks is useful/worth prioritizing. Here are some possible items and thoughts to get the discussion going:
- A WASI implementation for Unix-ish hosts (PR wasm2c: uvwasi support #2002) -- will be awesome to have this in
- Some sort of continuous performance regression testing. Basically an operationalized/ongoing version of the benchmarks at https://kripken.github.io/blog/wasm/2020/07/27/wasmboxc.html
- Increased safety when modules link with other modules. Right now it's pretty easy to create a segfault (or worse) by having one module import something by name, and another module exports something with the same name but an incompatible type, because right now wasm2c makes "optimistic" extern declarations for everything that's imported. We could improve this by making the wasm2c-generated code actually include the header from the imported-from module (so the C compiler will enforce type correctness), instead of making an optimistic declaration in its own header (wasm2c linking safety (enforcing import subtyping rules at compile time) #1908), or by reintroducing some sort of type mangling.
- SIMD support (PR wasm2c: simd support (v2) #2119). This is the last remaining piece to get full conformance with WebAssembly 2.0, and it would be kind of cool to say we conform to the whole spec before it's finalized. I think I've done most of the scaffolding work, but still need to implement all those instructions, ideally in a way that spits out generated code that's both (a) inlined to SSE2 intrinsics when available, but also (b) with backup implementations in pure C for everywhere else, and then (c) ideally we'd test both backends in the CI.
- Get Mozilla using the upstream wasm2c as part of the Firefox build process. Would probably be great to have a production "customer"; I know they had wanted bulk memory support (which we now have) but I think they probably also depend on a bunch of features in the UCSD/rlbox fork. We could work with Mozilla/UCSD to get them transitioned over to the main branch.
- wasm2c "one .c file per function" mode (PR wasm2c: multiple .c outputs #2146). It takes a huge amount of time and memory for gcc/clang to compile the output of wasm2c for a big program (especially with optimization) because it's just one gigantic C file. But the structure of wasm2c's output is so well-structured that it would be trivial to split it up into a single .c file per function (each importing the same .h file as currently). This is probably much more parallelism than is even in the original program. And if the function names remain stable, then with a memoized/hashing build system, it would be possible to change or insert just one function in a gigantic program and 99% of the work of compiling the wasm2c output could be memoized. This would be super-cool. (You might worry about losing opportunities to inline, for which I think the best answers would be (a) we should do the continuous performing monitoring of above, (b) you don't have to use this option, (c) LTO, or (d) hopefully the good inlining opportunities were already taken upstream by the optimizing compiler that produced the .wasm file in the first place.)
- Speeding up wasm2c itself on large programs (PR c-writer.cc: Add local symbol prefix. #2171 for a big chunk of this). One approach might be to move away from using BinaryReaderIR (which manifests the entire program in RAM) and create a custom BinaryReaderWasm2C that could process the file in one pass in a streaming manner, just like we have BinaryReaderInterp already. This may be a bit risky because (a) now we have to make sure we hook into the validator everywhere we need to or else badness, and (b) I don't know how much the performance improvement would be, but I think w2c2 suggests this might be a fruitful route.
- Better fuzzing. I don't think we're fuzzing wasm2c right now at all. It would be nice to have a fuzz target in OSS-fuzz, ideally one that not only checks for safety violations but also tries to find disagreements between wasm2c's output (when compiled and run) vs. the WABT interpreter.
- Selectable behavior on a per-memory basis about whether OOB is hardware-checked or software-checked. For the main memory of a long-running program, it's much faster to use mprotect and the signal handler to detect OOB on a memory. But for memories that are attached transiently to some random region of memory (to give zero-copy access to a binary blob) and then detached, it's a lot of overhead to have to set up these 8 GiB mmapped/mprotected regions for everything you might want to ever point to. It would be nice to be able to tell CWriter which memories in a module should have explicit (software) OOB checks on load/store and which should rely on the MMU and signal handler. This could be done by... (a) adding a field to the Memory structure in the IR (most convenient, but kind of icky since it's really a wasm2c-specific annotation), or (b) some sort of condition that depends on the debug name of the memory, or (c) maybe something in the WriteCOptions that allows the caller to indicate its preference on a per-memory basis. We're using "a" in our private branch, but if this is more general interest, happy to find consensus on the best approach in general.
- (Added 11/8): Ability to "de-init" an individual module and remove its func types from the runtime, leaving other modules intact? (Effectively done a different way in Eliminate wasm_rt_register_func_type in wasm2c runtime API #2120.)
- (Added 4/5) Add a method to "reinstantiate" ("reset"?) a module instance without having to free and then instantiate a new one.
- Start work on wasm2rust. Not totally serious, but it would be cool if this existed.
Metadata
Metadata
Assignees
Labels
No labels